Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!husc6!cmcl2!beta!hc!ames!lll-tis!ohlone!nelson From: nelson@ohlone.UUCP (Bron Nelson) Newsgroups: comp.misc Subject: Faster/cheaper execution of Unix pipelines? A proposal. Message-ID: <385@ohlone.UUCP> Date: Fri, 16-Oct-87 03:53:56 EDT Article-I.D.: ohlone.385 Posted: Fri Oct 16 03:53:56 1987 Date-Received: Sat, 17-Oct-87 18:03:39 EDT Organization: Cray Research Inc., Livermore, CA Lines: 53 Keywords: Unix pipes, program composition, small is beautiful There has been considerable discussion (mostly in comp.arch) about "Big Programs Hurt Performance" and the relative merits of a single program with lots of options, vs. communicating small programs, in particular the use of pipes in the Unix world. The canonical example used is the Unix "ls" command producing single-column and multi-column output, as opposed to piping single column output through "pr." The software engineer in me insists that "many small routines, each doing one job well" is the correct thing to do, but my practical side recognizes the high cost of Unix pipes. I think the answer is not to give up on "small is beautiful;" rather it seems that the answer is to invent a cheaper method of combining small pieces together. I ask for your comments, proposals, and examples of existing systems that do a good job. The rest of this note is a proposal for one method of combining programs together that seems like it should be possible to do as an "add on" in an existing system. Comments and criticisms are welcome (as well as cries of "but OS/xyz already does that!"). I'll use the simple pipeline "ls | pr -4" as my example. Execution of this requires several forks/execs, read/write system calls, etc. Instead, let's invent a program something like a compiler/linker that can take the binaries of ls and pr, and directly compose them into a single process. I'll call this mythical beast the "Pipeline Composer," and the individual pieces "ls" and "pr -4" will be "Pipeline Fragments." Instead of a pipe, we just allocate a 4K buffer directly in the composed program's address space, and calls to "write" by ls, and "read" by pr would fill and empty this local buffer, rather than an external one. The tricky part that needs to be written is the driver routine for the composed program, and new interfaces to the read/write routines that will be used by the composed program. When a call to read (write) finds the buffer empty (full), you do not block and trap to the system (as you would with a pipe), instead you trap to the driver. The driver looks for a pipeline fragment that is not blocked (or has become unblocked), and (re)starts that fragment. Now, this is tricky business in that the driver has to keep track of what amounts to multiple process contexts and multiple call stacks, but it all seems do-able (not easy). No doubt some restrictions will have to be placed on pipeline fragments to ensure they can be composed (various system calls would be off limits), but I could live with that. Under this scheme, the composed program has significantly less interaction with the system than the pipeline would. The composed program is of course somewhat bigger and slower than a custom built utility would be, but it should be smaller and faster than the pipeline. The individual pieces can be small and simple; not encumbered by vast numbers of rarely used options. Only the people wanting the additional functionality need pay for it, and their cost is modest (I hope). ----------------------- Bron Nelson {ihnp4, lll-lcc}!ohlone!nelson Not the opinions of Cray Research