Path: utzoo!attcan!uunet!samsung!usc!ucsd!ucbvax!hplabs!hpfcso!mjs
From: mjs@hpfcso.HP.COM (Marc Sabatella)
Newsgroups: comp.lang.misc
Subject: Re: dataflow shell
Message-ID: <8960001@hpfcso.HP.COM>
Date: 20 Nov 89 17:39:26 GMT
References: <0207@sheol.UUCP>
Organization: Hewlett-Packard, Fort Collins, CO, USA
Lines: 97

>> Unix pipe programs are one dimensional because they're limited by the shells
>> to be one dimensional.   [...]
>> Once you've got this dataflow shell, scaling it up and adding features to it
>> to make it a full-fledged language wouldn't be too hard.

>This particular trick can actually be done with the bourne shell as it
>currently exists.  Of course, just arbitrarily drawing two dataflows out
>of the grep doesn't make the program capable of producing two
>datastreams.  We would have to get used to more "pipe fitting" programs
>or switches to existing programs.

The real limitation here is not the shell, as has been observed, it is the
general rule that most programs have been written to undertand stdin and stdout
and not care about other streams.

As part of an MS project on adding message-based IPC to a simple multitasking
OS, we added all the necessary features to the OS and the interpreter (a Forth
interpreter - think of it as the shell) to do this.  There were only two
reasons I didn't - no obvious convenient syntax to use, and a lack of time to
think one up and implement it.

Anyone interested in "dataflow shells", please read on.  Can you suggest a
clean shell syntax for what we are doing?

What we had was the following (translated to be as Unix-like as possible):
All I/O is done through message ports (named pipes?), each of which can have
an arbitrary number number of readers and writers.  Anyone can send (write) to
a port (assuming correct permissions if translated to Unix) with no special
preparation.  To receive messages (read), you connect to (open?) the port,
which places you on its 'mailing list'.  All future messages sent to that port
are forwarded to you, as well as anyone else connected to the port.  Messages
accumulate in your mailbox (no good analogy) and may be retrieved at your
liesure.

Given this set up, one implemented "dataflow pipes" by brute force.  You have
a command port shared by all processes in the pipe, and as many data ports as
you wish.  You start up each process with a list of the named ports it is to
use as input and output, and the code within each process would then connect to
each of its input ports.  It was not necessary to use more than output port
(since ports can have an arbitrary # of readers) unless you wished to produce
different streams (say, stdout and stderr).  And since this a real time OS, and
since message sent to a port are lost if no one is there to receive them, each
process in the pipe connected to the command port and waited for the 'start'
command.

One example of this was a pipe that took input from an arbitrary number of
sources (usually at least a disk file and a synthesizer), merged the input
streams (sorted in real time order), and wrote the result to an output port.
The output was often redirected to at least two devices - a disk file, and a
synthesizer.  Disk files and synthesizers had device drivers which allowed them
to be treated as ports for input, and as processes for output.

The commands to set this up were something like:

1. start up a file reader process for each input file
   parameters are name of input file "port" (device driver)a
   and name of output port
	(this was necessary so the files would present their input to the
	 merger process without being queried - the files contain timestamped
	 messages which the reader process would send at appropriate times
	 relative to the "start" command)
2. start up merger process
   parameters are list of input ports and name of output port
3. connect output file(s) & synthesizer "processes" (device drivers)
   to merger output port
4. give "start" command

On the "start" command, the file readers would start sending messages, and the
merger would start reading its messages.  You could also start generating input
from any of the synthesizer devices at this time.  Given a "dataflow shell",
the kludge of the "start" command would be unnecessary - at any rate, it would
be hidden from you by the shell (since this is a real time system, you
probably would have to keep the model of starting up each process and have them
wait).  The kludge of the named ports for reader & merger output could also be
done away with - the shell would presumably set these up for you.

Graphically, what we have is the following:

inputfile1 --- reader1 \
                        |
inputfile2 --- reader2  |         outputfile1 (device driver)
                      \ |        /
                       merger ---
                      / |        \
synthesizer1 ---------  |         synthesizer1 (device driver)
                        |
                       /
synthesizer2 ----------

Given that a) the Bourne shell method of knowing file descriptors used for I/O
by programs does not fit in well with this message based model; and b) it is
excruciatingly ugly; can anyone suggest a syntax for expressing this sort of
thing?

--------------
Marc Sabatella
marc%hpfcrt@hplabs.hp.com