Path: utzoo!utgpu!watserv1!watmath!att!att!linac!pacific.mps.ohio-state.edu!zaphod.mps.ohio-state.edu!julius.cs.uiuc.edu!apple!spies!zorch!xanthian
From: xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan)
Newsgroups: comp.sys.amiga.tech
Subject: Re: PIPEs
Message-ID: <1990Nov18.150258.16061@zorch.SF-Bay.ORG>
Date: 18 Nov 90 15:02:58 GMT
References: <1990Nov10.082242.22949@agate.berkeley.edu> <8283@gollum.twg.com>
Organization: SF-Bay Public-Access Unix
Lines: 257

david@twg.com (David S. Herron) writes:
> pete@violet.berkeley.edu (Pete Goodeve) writes:
>> Kent Paul Dolan (xanthian@zorch.SF-Bay.ORG) writes:

>	[ an interesting idea which I (personally) think looks a bit
>	  ugly and daunting to type from the keyboard ..]

Perhaps, but easy and intutitive and solving many problems not currently
capable of solution.  You haven't seen "ugly" yet; see below.

>> Hmmm. What an interesting idea... In fact it gets more intriguing as
>> I think on it. "Fan out pipes" are something that even unix can't do,
>> as far as I can see. (You can 'tee' to a pipe and a file, but not to
>> two parallel pipes, can you?) And a convenient way of "broadcasting"
>> data to a number of processes is something that I've had on my mind
>> for a long, long time.

> Unix has the capability of doing this - all the OS facilities are
> there it's just that it's hard to represent it in a linear line of
> text. Since Unix commands (under /bin/{,c,k}sh) are a linear line of
> text this is a problem. In fact, I vaguely remember seeing a multi-way
> `tee' program come across some sources group once.

I don't think so. The example, reformatted below, explicitly has a
process feeding a prior process, something which doesn't work with
tempfiles. I could have included a loop, the notation supports it, but
making it anything but an infinite data source then requires knowing the
internals of the executing commands, a point I wanted to avoid raising
in a first proposal.

> Another problem is -- where might this be used? As I said above, the
> notation you're suggesting is a bit ugly (to my eye).

Anywhere you'd want a complex interconnection of interoperating programs;
consider throwing up a half dozen requestors from a half dozen programs
to control some multimedia spectacular.  This allows them to be tightly
coupled without being co-written.

> It's not the sort of command I'd be typing in off the top of my head
> and, besides, I've lived on Unix for years with just linear pipes. The
> few times where it might've been nice to have tree-structures of
> piped-together processes has always been in a shell script & it was
> pretty easy to use temp files of my own & delete them when done.

Except that temp files kill speed, hog space, and are quite
inappropriate for the "immortal" processes that run from boot time to
boot time in modern multitasking OSs. There was no suggestion that an
extremely complex nest of multiple input, multiple output pipes had to
be created _literally_ from the command line; like all complex
processes, that model would give way in reality often to the scripts you
suggest.

> For an ad-hoc creation of tree-structured piles of processes it seemed
> you'd want some sort of graphical shell in which you'd have process
> names and pipe symbols that you could drag around & connect up & play
> around with.

This is certainly an alternative, and it has been tried several times;
the lack of precision, problems of manipulation, and lack of a common
agreement on terms and appearances have so far left the efforts mostly
research toys. That doesn't lessen the ultimate value of a working
"visual pipeboard" solution, but it does suggest that a simpler, shell
oriented solution for the shell user has a place.

> cmd1 | (
>    tee /tmp/some-file; some-other-command </tmp/some-file; rm /tmp/some-file 
>	) | cmd2

> Hmm.. that just came to me, and is perfectly reasonable in Unix
> shell-ese. The string of commands under "some-other-command" would not
> even start until the "tee" process finishes, then any output it makes
> would go into `cmd2', _following_ the output of `tee'. Also if you
> iterate on the () (in Unix shell-ese that starts up a subprocess
> executing the command string within ()'s) you can create a fairly
> bizarre nesting of processes. Using `&' and `>' would send
> some-other-command off into the background so that `cmd2' can finish
> without having to wait for some-other-command to finish.

But the chance for loops and avoiding temp files would still be missing,
as would more crucial mechanisms not yet discussed; see below.

I'm also not aware of uses for "&" to mean backgrounding except at line
end; this may well be just my ignorance; I can't read most shell scripts
I see.

> But toss a few inline awk scripts into that pipeline and it quickly
> becomes un{read,maintain}able.

Hey, this is true of any computer science language you care to name; it
hasn't stopped even sed, awk, and APL from being exceptionally useful.

>>             PIPE cmda args >-{1,2}  +
>>             cmdb args >-{1,2,3}     +
>>             -<1 cmdc args >-2       +
>>             -<2 cmdd args >-4       +
>>             -<3 cmde args >-1       +
>>             -<4 cmdf args > result
>>
>>This isn't much worse (to my eye!) in appearance that the other,

> Well.. I guess there's just no pleasing some people. If I'm reading
> this right you want the input of cmdd to be some combination of the
> outputs of cmda, cmdb and cmdc. If each of those are executing
> concurrently how do you avoid mixing the outputs? If you make sure the
> outputs aren't mixed, how do you specify the order that cmdd see's.
> Would it be in the order specified in the command, random, or what?

More exciting and pertinent was the feed from cmde to cmdc, at least a
nontrivial task in the current shells.  I was more amazed that no one,
in the context of AmigaOS, challenged my coopting of "{}" as metasymbols
with the "obvious" meaning.  The supposition that Amiga users are not
Unix shell users in drag stands disproven, at least for that portion
inhabiting USENet.

You didn't have my advantage of sitting through Pete Goodeve's
presentation of his pipe joiner at BADGE. The trick to keep things as
straight as they need to be is that "things" are passed from pipe
traffic producer to pipe traffic consumer in packets using AmigaOS's
message and reply system.  A simple flush() after each meaningful unit
by the producer, combined with a sufficiently large pipe packet size,
will serve to synchronize the pipe traffic into meaningful chunks for
the consumer.

>ohwell.. so long as unix-ese-speaking shell's are available then
>I'll be happy ;-)..  (And, no, I didn't "grow up" on Unix .. I've
>used many many other kinds of systems .. I find the notation in
>Unix shell script-ese (Bourne or Korn shell.. I *emphatically*
>don't write programs with csh) to be very convenient & powerful.)

Speaking of power ... ;-). I hadn't yet thrown my real spanner into the
works, hoping to let this cud be first digested by the onlookers.

The trouble with pipes, even if they contain fan-out and fan-in
functionality just outside the bounds of the individual process with the
acceptance for further effort of the above work, is that the present
notation only usefully support processes that are, at least with respect
to piped data, "filters": single input, single output.

Filters are not the whole world, and, though tagged packetized data is a
way to make a non-filter in concept act like a filter in reality with
the addition of a smart fanout with respect to the tags, it would be
nice to support the more interesting programs with m-way-data-in,
n-way-control-in, o-way-data-out, p-way-control-out, 1-way-messages-out,
and 1-way-log-out paths, a more realistic piece of the processing
universe.

In order to do this in a visual plumbing GUI, you have to have room for
lots of "quick disconnect fittings" per process, with label and tags and
"keying" to make sure the right pipe hits the right fitting and lots of
other complexities that make the problem pretty intractible.

It is much less complex to design this functionality into a script
shell; you just have to provide a tag corresponding to the file handle
for each pipe fitting, the stdin=0, stdout=1, stderr=2 of Unix filter
processes the paradigmatic example.

I won't claim to have any "non-ugly" ways, or ways that would be fun to
type raw as opposed to editing into a script , to do this, but here is
one possible example of how it could be done. I _will_ contend that
making this possible from a scripting language would be nearly
infinitely useful (if for no other reason than that it provides a model
for the GUI solution), to allow multitasking support for high speed, low
data storage overhead ways to interconnect and tightly couple processes
by independent authors.

Suppose we have three processes; the first eats a stream of text and a
stream of editing commands, and emits the edited text and forward and
reverse diff files. The second samples the system clock, times the edit
commands and selectively feeds back new commands which vary depending on
the time versus space efficiency of the current system load, and writes
a log of its work, and a file of the time tagged load averages, to disk.
The third does a consistency check of the diff files, feeds a control to
the first when problems occur, and appends a join of its log and the
second processes log to the input text file. All write any needed error
messages to the system console. All run from boot up to shut down.  (This
is all nonsense created just to have an example, of course.)

We will start with Pete's revision of my original proposal; why fight
over trivia? However, despite his noting the ease of parsing if the
inpipes are to the left, I have returned to the easier to read AmigaOS
and Unix form with the command name first, after seeing how intensely
ugly it looked the other way with this proposal's increased complexity
burying the commands in mid line.

Next is a summary of the three processes, providing file handle (fh)
assignments, followed by a script entry to spawn and interconnect them.
In hopes of making it all viewable on a single screen, vertical
whitespace has been omitted, so you might want to single line step your
newsreader through the next bit. It exactly fits in 24 lines with
"process1:" at the top of the screen. I've made some arbitrary but
reasonable guesses at file handles for clock and type.

process1: reads text from fh0, editing commands from fh1, and diff
control messages from fh2; it writes edited text to fh3, forward diff to
fh4, reverse diff to fh5, copies its input commands to fh6, and logs
errors to fh7.
process2: reads the clock tics from fh0, reads command copies also from
fh0 to get the best available times from interleaved messages, monitors
process1's error log at fh1, runs at a higher priority than the other
two processes, writes edit commands to fh2, writes a log to fh3, writes
averages to fh4, and logs errors to fh5.
process 3: reads the forward diff from fh0, the backward diff from fh1,
and the process2 log from fh2; it writes added input for process1 to fh3,
a joined log to fh4 and logs error messages to fh5.
  PIPE                                                                     +
  clock    -tic 20 >-6.0                                                   +
  type     -<<1.0 >-3.1 input_text                                         +
  type     -<<2.0 >-4.1 edit_commands                                      +
  process1 -<3.0 -<4.1 -<5.2 >-.3 edited_text >-8.4 >-9.5 >-6.6 >-{7,11}.7 +
  changetaskpri 5                                                          +
  process2 -<6.0 -<7.1 >-2.2 >-{9,12}.3 >-.4 averages >-11.5               +
  changetaskpri 0                                                          +
  process3 -<8.0 -<9.1 -<10.2 >-1.3 >-.4 p3log >-11.5                      +
  type     -<11.0 >-.1 *                                                   +
  type     -<12.0 >-.1 p2log  &


I _said_ it was ugly! ;-) If it isn't obvious, the m.n format is
"pipename.filehandle", where the pipename belongs to the script and not
really to the receiving process, but is named at the receiving process
command line, and the pipes are (in this case, for tutorial purposes,
not out of necessity), numbered sequentially vertically down the screen;
while the file handles belong to the process on the same line. Sorry
about having the clock output to file handle zero, though. ;-) So, for
example, where process1 writes to the tee ">-(7,11}.7", the first half
of the tee is being read by process2 at "-<7.1"; the pipename is "7",
process1 is also using its filehandle fh7 (".7") (purely coincidence) to
write the data, and process2 is usings its filehandle fh1 (".1") to read
the data.

Note that the next to the last "type" is synchronizing three inputs of
the "stderr" outputs of process[123] to the console.  Notice the omission
of the pipename when input is from or output is to a file, but that the
filehandle is still needed, thus several ">-.1 filename"s, for example.

[I probably buggered the semantics of changetaskpri in there; do the
obvious right thing.]

I will freely confess that 1) this was an easy one, a hard one would get
messy and require lots of parser working storage to process, 2) it took
me over twenty minutes to type this easy one, and 3) I would never think
of attempting to type this as a command, but would always make it a
shell script, and then probably have to debug that.

Nevertheless, it provides the kind of interconnection of processes
lacking in Unix and its lookalikes at the command interface level.
Lacking such facilities, processes like the above have to be designed
and coded as monolithic projects, rather than loose coded and plugged
together at the command interface. The seeming ugliness here would
remove a much more intense process fork and file handle maintenance
ugliness currently rife in the C code of large Unix programming suites.

No, I can't write the code to make this work. Daydreaming about it
passes the time.

Kent, the man from xanth.
<xanthian@Zorch.SF-Bay.ORG> <xanthian@well.sf.ca.us>