Path: utzoo!censor!geac!torsqnt!lethe!yunexus!ists!helios.physics.utoronto.ca!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!think.com!barmar
From: barmar@think.com (Barry Margolin)
Newsgroups: comp.unix.questions
Subject: Re: Can UNIX pipe connections be compiled?
Message-ID: <1991Jan18.224323.16722@Think.COM>
Date: 18 Jan 91 22:43:23 GMT
References: <1991Jan18.193234.216@rucs.runet.edu>
Sender: news@Think.COM
Organization: Thinking Machines Corporation, Cambridge MA, USA
Lines: 58

In article <1991Jan18.193234.216@rucs.runet.edu> dana@rucs.runet.edu (Dana Eckart) writes:
>Does there exist a piece of software (or is it even possible) to compile
>a pipe?  In particular, suppose you had 
>
>	ls -l | fgrep "Dec" | cut -f 4
>
>is there anyway to compile the above pipeline so that the pieces can
>communicate more quickly.  I am looking for a general solution, not
>one that works only for the above example.

I'm not really sure I (or you) understand what you expect the pipe to be
compiled into.  On Unix, each program has to be run in its own process, so
they're going to have to use some form of inter-process communication to
feed the data to each other.  There are shell script compilers, but all
they do is save the overhead of parsing the commands and interpreting shell
built-ins; the compiled script still runs each command in its own process
and sets up pipes for them to communicate.

>The question arises because I have constructed some small programs which 
>become VERY slow when piped together.  It appears that if I can get around 
>the slow speed of standard (character based) i/o that things will be MUCH 
>faster.

If the programs that are used in the pipeline do character-at-a-time I/O,
then speeding up the pipeline isn't going to help.  Compiling the pipeline
wouldn't change the programs; they'll still be doing character I/O.

I strongly doubt that the speed of the pipe is the limiting factor; this is
a pretty simple mechanism whose performance is extremely important to most
Unix implementors.  I just timed the following on a Sun-4/330 running SunOS
4.0.3:

	cat file file file | cat >/dev/null

"file" is a 4Mb file on an NFS server.  The SunOS version of "cat" uses
mmap() to read in files named as arguments, so once it is all paged into
memory (I ran the command until it got zero page faults) nearly all the
overhead should be in the pipe (about 95% of the CPU time was system time,
and I doubt I was spending much time in the null device driver).  I was
getting about 4Mbyte/CPU-second throughput.

And I think most stdio implementations don't actually do
character-at-a-time I/O.  getc() and putc() are usually implemented as
macros that read/write a buffer, and don't actually do any I/O until the
buffer is empty/full (putc()'s output buffer will also be flushed if you
call fflush()).

>Although I suspect I am stuck (unless I rewrite my code - combining the
>pieces programs into a single program), perhaps some kind netter will be
>able to save me a great deal of grief.

Have you actually profiled your programs and found that they are spending
most of their time doing I/O to pipes?
--
Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar