Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!husc6!cmcl2!beta!hc!ames!lll-tis!ohlone!nelson
From: nelson@ohlone.UUCP (Bron Nelson)
Newsgroups: comp.misc
Subject: Faster/cheaper execution of Unix pipelines?  A proposal.
Message-ID: <385@ohlone.UUCP>
Date: Fri, 16-Oct-87 03:53:56 EDT
Article-I.D.: ohlone.385
Posted: Fri Oct 16 03:53:56 1987
Date-Received: Sat, 17-Oct-87 18:03:39 EDT
Organization: Cray Research Inc., Livermore, CA
Lines: 53
Keywords: Unix pipes, program composition, small is beautiful

There has been considerable discussion (mostly in comp.arch) about
"Big Programs Hurt Performance" and the relative merits of a single
program with lots of options, vs. communicating small programs,
in particular the use of pipes in the Unix world.  The canonical
example used is the Unix "ls" command producing single-column and
multi-column output, as opposed to piping single column output
through "pr."  The software engineer in me insists that "many small
routines, each doing one job well" is the correct thing to do, but
my practical side recognizes the high cost of Unix pipes.  I think
the answer is not to give up on "small is beautiful;" rather it
seems that the answer is to invent a cheaper method of combining
small pieces together.  I ask for your comments, proposals, and
examples of existing systems that do a good job.

The rest of this note is a proposal for one method of combining
programs together that seems like it should be possible to do as an
"add on" in an existing system.  Comments and criticisms are welcome
(as well as cries of "but OS/xyz already does that!").

I'll use the simple pipeline "ls | pr -4" as my example.  Execution
of this requires several forks/execs, read/write system calls, etc.
Instead, let's invent a program something like a compiler/linker that
can take the binaries of ls and pr, and directly compose them into a
single process.  I'll call this mythical beast the "Pipeline Composer,"
and the individual pieces "ls" and "pr -4" will be "Pipeline Fragments."
Instead of a pipe, we just allocate a 4K buffer directly in the composed
program's address space, and calls to "write" by ls, and "read" by
pr would fill and empty this local buffer, rather than an external one.
The tricky part that needs to be written is the driver routine for the
composed program, and new interfaces to the read/write routines that
will be used by the composed program.

When a call to read (write) finds the buffer empty (full), you do not
block and trap to the system (as you would with a pipe), instead you
trap to the driver.  The driver looks for a pipeline fragment that is
not blocked (or has become unblocked), and (re)starts that fragment.
Now, this is tricky business in that the driver has to keep track of
what amounts to multiple process contexts and multiple call stacks,
but it all seems do-able (not easy).  No doubt some restrictions will
have to be placed on pipeline fragments to ensure they can be composed
(various system calls would be off limits), but I could live with that.

Under this scheme, the composed program has significantly less interaction
with the system than the pipeline would.  The composed program is of
course somewhat bigger and slower than a custom built utility would be,
but it should be smaller and faster than the pipeline.  The individual
pieces can be small and simple; not encumbered by vast numbers of rarely
used options.  Only the people wanting the additional functionality need
pay for it, and their cost is modest (I hope).

-----------------------
Bron Nelson     {ihnp4, lll-lcc}!ohlone!nelson
Not the opinions of Cray Research