Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!zaphod.mps.ohio-state.edu!van-bc!ubc-cs!uw-beaver!cornell!ken
From: ken@gvax.cs.cornell.edu (Ken Birman)
Newsgroups: comp.sys.isis
Subject: Re: Fast causal multicast
Message-ID: <48346@cornell.UUCP>
Date: 14 Nov 90 14:53:39 GMT
Sender: nobody@cornell.UUCP
Reply-To: scott@amethyst.omg.ORG
Distribution: comp
Organization: Cornell Univ. CS Dept, Ithaca NY
Lines: 170

>From scott@amethyst.omg.ORG Tue Nov 13 23:10:48 1990
>Subject: Re: fast causal broadcast

You write:

>  But, say that I build an application in the style that the LINDA people
>  advocate.
>
>  I have a computational thread that writes some parameters on a
>  blackboard (some stuff that you might not need to see, but if you do,
>  it costs a lot to get it wrong).  Then, it puts a bunch of work
>  requests into a bag of work requests.
>
>  Server programs pull out work requests and start computing.  They can
>  check the blackboard if desired.
>
>  Now, I would argue that:
>
>   1) [... ]  So, the bboard update is asynchronous, and the task bag
>      update now creates a race condition.
>
>      Specifically, will the server see the bboard update when it looks?
>      With a causal guarantee, it will.

It is exactly my contention that because there is this causal
relationship between the blackboard update and the work-request-bag
update the application developer must EXPLICITLY handle this
relationship by creating a process group containing the PG members from
both the blackboard server and the work-request-bag server.  The
original two PGs still exist; they may be used by clients that are not
concerned with inter-PG causality.  This scheme assumes that PG members
simply discard messages that they don't understand.
 
You might argue that the increased group-size would incurr a substantial
performance penalty.  Under a pessimistic protocol it probably
would.  However with an optimistic protocol better suited to the
reliability characteristics of the pervasive CSMA/CD ethernet or
Token Ring physical layers there would be little performance penalty.
PG members would only have to adjust their VT/LTs.
 
I see this problem as a general concurrency problem, rather than
a distributed system problem.  A concurrent programer faced with such
a problem would use some sort of MUTEX mechanism to ensure that
the causal relationship between the two processes is respected.
 
To pursue this chain of reasoning somewhat further I see the
field of distributed computing as an amalgam of two fields:
concurrent computing, and communication protocols.  Many
approaches seem to do well in one regard but not in the other.
For example the Hermes system (R. E. Strom et al, IBM) handles
concurrency nicely but sidesteps the communication issues.
Isis seems to be just the opposite (somewhat unavoidably as
it is implemented in C).  My theoretical inclination is to start with a
good concurrent language, Actors, as proposed by Agha and Hewitt,
is my current favorite, and install useful communication semantics
under it.  My practical inclination is of course to do something
in the  C/Unix environment so people will use it.
 
>   2) This is not an uncommon paradigm.
 
Agreed.  However, I dispute the assumption I see lurking behind this
statement, that the benefits of concurrency (or any other stye of
programming) may be obtained "for free" - ie.  that through clever
systems programming on the part of a few, J. Random Hacker may write
generally useful concurrent programs in blissful ignorance of the
fundamentals of concurrent programming.  New tools demand new skills.
As an engineer, my concern is to develop a set of tools that will allow
other engineers, skilled at concurrent programming, to develop
distributed applications.  Developing this set of tools may require the
abandonment or revision of some common paradigms just as functional and
OO programming have.  Further Kuhnian rantings elided...
 
>   3) Fault-tolerance [... ]
 
I have to think about this one.  It's too complex for me to bash
it into my model on the fly.
 
>  Our paper argues for causality classes: causality would be enforced
>  within multiple groups in a single class, but might be violated over
>  class boundaries.  The idea is that most groups, by default, would
>  be in some single class.
 
Here is the crux of our discussion.  I am proposing to regulate
causality using an DAG of homogeneous nodes (each node being a PG) and
you, if I understand your argument correctly, are proposing to relax
the acyclic condition on the PG graph and avoid unneeded causality
enforcement by adding a second DAG (or flat list - are causality
classes heirarchical in your model?) encoding the exact causality
relationship desired.  It seems clear to me that your PG model is
theoretically more powerful, as it allows the group membership graph to
be arbitrary while guaranteeing causally correct message delivery.  A
more precise wording of my original question is: What practical
advantage does your more powerful theoretical model allow?  What
systems can you build that I can't?
 
>  There are a number of additional issues that we look hard at, too:
>    1) Why have the system manage groups at all?  Why not put this at the
>       application level?
 
Something must know about the physical topology of the network, which
must be assumed to be heterogenous and of arbitrary complexity.  Forcing
applications to manage their own groups means that each application
will have to contain some encoding (albeit in a "nice" form) of the
physical topology.  Not a pleasant prospect for the network administrator.
In the absence of some standard mechanism for making the network
topology available to applications I would expect a distributed
system development tool to provide some sort of distributed facility where
network topology could be encode.
 
 
>    2) Given groups, why not do multicast using parallel RPC?  (Or, perhaps,
>       transactions, if you want atomicity)?
 
I have looked at this (big surprise).  I assume, by the way, that you
mean "broadcast RPC", augmented by some sort of multiplexing over
different communications media.  I surmise that RPC protocols (even
over broadcast media) do not provide exactly the functionality we want
and hence will tend to be less efficient, likely prohibitively so.
Some RPC-like functionality, transfer syntaxes in particular, is
necessary.  I do believe that a stub generator (much like an RPC
compiler) is an excellent interface to an underlying PG abstraction.
The application developer might specify the set of messages as
augmented C function declarations much the way the current Netwise RPC
TOOL does.  Rather than generating client and server stubs, the PG stub
generator would generate a broadcast stub which PG members would call
to broadcast messages to other PG members, and a receive stub which
would be called by whatever was running the protocol when it needed to
deliver messages.  PG members would run as an event loop (obviating the
need for threads.)
 
Speaking of threads.  How does Isis handle non-kernel threads?  Using
Sun's lwp library for example, a blocking system call will block all
other threads in the process unless you are willing to pay the 500%
performance penalty (yes, I benchmarked it) for using the non-blocking
io library (Assuming you can.  Liblwp uses the interval timer and some
signals).  Yes but threads are more efficient...  Cough.
 
>    3) How much synchronization is needed between group membership and
>       multicast
>    4) Causality: see above
>    5) Do we need the other ISIS protocols (ABCAST, etc, etc, etc, etc)
 
For the moment I think so.  It would be nice to have a tool that would
deduce the correct protocol to use from some sort of specification but
I haven't seen even a good guess as to what that specification would
look like.  I currently am playing with four classes of multicast
protocols:
 
   1. Unreliable Broadcast: essentially UDP, no PG assumptions
   necessary
 
   2. Atomic Broadcast: (your FBCAST?) provides atomicity "eventually"
   (best effort) but makes no guarantees about message ordering.
   Requires PGs.
 
   3. Causal Broadcast: (your CBCAST) provides atomicity, and causal message
   ordering (withing PGs only).
 
   4. Totally Ordered Broadcast: (your ABCAST) provices atomicity and
   global (intra-PG) message ordering.
 
>    6) Is there a simple, clean architecture for all this?
 
Of course there is.  The obvious next step in this direction is to
move from procedure call interfaces to compiled specification file
interfaces.
 
Thanks again for the discussion.  It's been very stimulating.
 
--Scott