Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!mcsun!ukc!pyrltd!abekrd!garyb
From: garyb@abekrd.UUCP (Gary Bartlett)
Newsgroups: comp.unix.shell
Subject: Re: Problem using multiple 'head' commands in shell script
Keywords: head shell buffering
Message-ID: <1678@abekrd.UUCP>
Date: 31 Jan 91 12:37:39 GMT
References: <1671@abekrd.UUCP> <6925@exodus.Eng.Sun.COM> <fcPl016n13RO00@amdahl.uts.amdahl.com>
Organization: Abekas Video Systems Ltd, Reading, England
Lines: 81

In <fcPl016n13RO00@amdahl.uts.amdahl.com> krs@uts.amdahl.com (Kris Stephens [Hail Eris!]) writes:
>In article <6925@exodus.Eng.Sun.COM> mcgrew@ichthous.Eng.Sun.COM (Darin McGrew) writes:
>>In article <1671@abekrd.UUCP> garyb@abekrd.UUCP (Gary Bartlett) writes:
>>->...
>>->It looks like 'head' initially reads in a whole buffer of data from file
>>->(stdin), prints out the requisite number of lines and then dumps the rest
>>->of the buffer.  The next 'head' then reads the NEXT buffer....
>>

>If, however, the    echo "Line ?01 follows"    in the original example
>was a place holder for "I want to do other stuff here, then pick up
>processing with the next set of lines", neither the awk nor the sed
>calls will allow it, as both simply insert the line-counting messages
>into the stream of data from file.

This is indeed what I intended - see my last piece of news on the subject.

>Dog slow though it be, the following will do it:
>	#!/bin/sh
>	(
>	i=1
>	while [ $i -lt 201 ]
>	do
>		read line; echo "$line"
>		i=`expr $i + 1`
>	done
>	: process some more stuff here
>	cat -
>	) < file

This is effectively what I started out using - a 'while' loop, an 'expr'
counter, and a couple of 'read's.  Hideously slow!

>You may be forced into multiple reads of the file to get something
>resembling good performance:

>The saving graces here are that, even though the file is opened three
>times, (1) only the first 200 lines are read thrice and the second
>200 twice, and (2) one avoids the nearly nightmarish performance of
>the while loops in the example preceeding this one.  It doesn't hurt,
>too, that sed is pretty quick.

The thing is, the file I'm merging from may be very long (ie very many
sed passes).

>Now, let's take it one step further and generalize it into a function...

I DO like the function idea though.

I did actually write my own 'head' (C) program which turned off all buffering
of the stdin before doing any reading.  This did the trick and worked in the
shell script.  It was faster but not greatly so - I guess it had to read every
character individually.  I did try using line-buffering but this did not work.
It still lost data (although not as much as when using the full-buffering of
head).  I'm not overly happy with that solution though - I but it's not at
all portable.

*** FLASH OF INSPIRATION ***

I have an idea:
- Process the original file by putting the line number at the beginning of
  each line,
- Process the file to be merged so that the merge points are at the beginning
  of each of these lines,
- Cat the two processed files together and pass through 'sort',
- Remove line numbers from beginning of resulting file, QED

This doesn't matter how big either file is.

Thoughts?

Thanks again for some very useful input,
Gary

-- 
---------------------------------------------------------------------------
Gary C. Bartlett               NET: garyb@abekrd.co.uk
Abekas Video Systems Ltd.     UUCP: ...!uunet!mcsun!ukc!pyrltd!abekrd!garyb
12 Portman Rd,   Reading,    PHONE: +44 734 585421
Berkshire.       RG3 1EA.      FAX: +44 734 567904
United Kingdom.              TELEX: 847579