Path: utzoo!censor!geac!torsqnt!lethe!yunexus!ists!helios.physics.utoronto.ca!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!think.com!barmar From: barmar@think.com (Barry Margolin) Newsgroups: comp.unix.questions Subject: Re: Can UNIX pipe connections be compiled? Message-ID: <1991Jan18.224323.16722@Think.COM> Date: 18 Jan 91 22:43:23 GMT References: <1991Jan18.193234.216@rucs.runet.edu> Sender: news@Think.COM Organization: Thinking Machines Corporation, Cambridge MA, USA Lines: 58 In article <1991Jan18.193234.216@rucs.runet.edu> dana@rucs.runet.edu (Dana Eckart) writes: >Does there exist a piece of software (or is it even possible) to compile >a pipe? In particular, suppose you had > > ls -l | fgrep "Dec" | cut -f 4 > >is there anyway to compile the above pipeline so that the pieces can >communicate more quickly. I am looking for a general solution, not >one that works only for the above example. I'm not really sure I (or you) understand what you expect the pipe to be compiled into. On Unix, each program has to be run in its own process, so they're going to have to use some form of inter-process communication to feed the data to each other. There are shell script compilers, but all they do is save the overhead of parsing the commands and interpreting shell built-ins; the compiled script still runs each command in its own process and sets up pipes for them to communicate. >The question arises because I have constructed some small programs which >become VERY slow when piped together. It appears that if I can get around >the slow speed of standard (character based) i/o that things will be MUCH >faster. If the programs that are used in the pipeline do character-at-a-time I/O, then speeding up the pipeline isn't going to help. Compiling the pipeline wouldn't change the programs; they'll still be doing character I/O. I strongly doubt that the speed of the pipe is the limiting factor; this is a pretty simple mechanism whose performance is extremely important to most Unix implementors. I just timed the following on a Sun-4/330 running SunOS 4.0.3: cat file file file | cat >/dev/null "file" is a 4Mb file on an NFS server. The SunOS version of "cat" uses mmap() to read in files named as arguments, so once it is all paged into memory (I ran the command until it got zero page faults) nearly all the overhead should be in the pipe (about 95% of the CPU time was system time, and I doubt I was spending much time in the null device driver). I was getting about 4Mbyte/CPU-second throughput. And I think most stdio implementations don't actually do character-at-a-time I/O. getc() and putc() are usually implemented as macros that read/write a buffer, and don't actually do any I/O until the buffer is empty/full (putc()'s output buffer will also be flushed if you call fflush()). >Although I suspect I am stuck (unless I rewrite my code - combining the >pieces programs into a single program), perhaps some kind netter will be >able to save me a great deal of grief. Have you actually profiled your programs and found that they are spending most of their time doing I/O to pipes? -- Barry Margolin, Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar