Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!newstop!sun!amdahl!krs
From: krs@uts.amdahl.com (Kris Stephens [Hail Eris!])
Newsgroups: comp.unix.shell
Subject: Re: Problem using multiple 'head' commands in shell script
Keywords: head shell buffering
Message-ID: <fcPl016n13RO00@amdahl.uts.amdahl.com>
Date: 30 Jan 91 17:05:53 GMT
References: <1671@abekrd.UUCP> <6925@exodus.Eng.Sun.COM>
Reply-To: krs@amdahl.uts.amdahl.com (Kris Stephens [Hail Eris!])
Organization: Amdahl Corporation, Sunnyvale CA
Lines: 173

In article <6925@exodus.Eng.Sun.COM> mcgrew@ichthous.Eng.Sun.COM (Darin McGrew) writes:
>In article <1671@abekrd.UUCP> garyb@abekrd.UUCP (Gary Bartlett) writes:
>->Can someone explain to me what is happening with the following Bourne shell
>->script and more importantly how I can get around it:
>->
>->	#!/bin/sh
>->	cat file | (
>->		head -200
>->		echo "Line 201 follows"
>->		head -200
>->		echo "Line 401 follows"
>->		cat
>->	)
>->
>->...
>->It looks like 'head' initially reads in a whole buffer of data from file
>->(stdin), prints out the requisite number of lines and then dumps the rest
>->of the buffer.  The next 'head' then reads the NEXT buffer....
>
>Yes, head reads a bufferful at a time.  I'd use awk:
>
>	awk '	NR==201	{print "Line 201 follows"}
>		NR==401	{print "Line 401 follows"}
>			{print}' < file

And this might be faster (note, though, that I'll need to left-justify
this to avoid inserting leading white-space).

-- start fragment --
sed \
-e '200a\
LIne 201 follows' \
-e '400a\
Line 401 follows' < file
-- end fragment --

I'm making no statement here that the sed call is better than the
awk call, just that if performance is significant, you might want
to try this approach too and compare execution times.

If, however, the    echo "Line ?01 follows"    in the original example
was a place holder for "I want to do other stuff here, then pick up
processing with the next set of lines", neither the awk nor the sed
calls will allow it, as both simply insert the line-counting messages
into the stream of data from file.

Dog slow though it be, the following will do it:

	#!/bin/sh
	(
	i=1
	while [ $i -lt 201 ]
	do
		read line; echo "$line"
		i=`expr $i + 1`
	done
	: process some stuff here
	while [ $i -lt 401 ]
	do
		read line; echo "$line"
		i=`expr $i + 1`
	done
	: process some more stuff here
	cat -
	) < file

It's only slightly better in ksh, by replacng the i=1 assignment with
typeset -i i=1   and replacing the expr call to increment $i with
((i += 1))   .   In either case, mayhem will result if file isn't at
least 400 lines long.

You may be forced into multiple reads of the file to get something
resembling good performance:

	#!/bin/sh
	(
	sed 200q file
	echo "Line 201 follows"
	sed -e '1,200d' -e '400q' file
	echo "Line 401 follows"
	sed '1,400d' file
	)

The saving graces here are that, even though the file is opened three
times, (1) only the first 200 lines are read thrice and the second
200 twice, and (2) one avoids the nearly nightmarish performance of
the while loops in the example preceeding this one.  It doesn't hurt,
too, that sed is pretty quick.

Now, let's take it one step further and generalize it into a function...

	#!/bin/sh
	
	#
	# A function to get $2 lines from file $1 starting at $3
	# Only the file ($1) is required
	#
	getlines() {
		file=$1
		count=$2
		start=${3:-1}	# default start at line 1
		if [ ! -r "$file" ]
		then
			echo "getlines: file '$1' not readable" 1>&2
			return 1
		fi
		# Whole file?
		if [ $start -eq 1 -a "$count" = "" ]
		then
			cat $file
			return $?
		fi
		# From start to EOF?
		if [  "$count" = "" ]
		then
			sed -n "$start,\$p" $file
			return $?
		fi
		# Start at line 1 for count lines?
		if [ $start -eq 1 ]
		then
			sed "${count}q" $file
			return $?
		fi
		# We have a start other than 1 and a count
		cut=`expr $start - 1`		# Don't print through $cut
		end=`expr $cut + $count`	# $end is last to print
		if [ $end -le $cut ]
		then
			echo "getlines: bad count($count)/start($start)" 1>&2
			return 1
		fi
		sed -e "1,${cut}d" -e "${end}q" $file
		return $?
		}

	#
	# Mainline code
	#
	file=${1:-file}	# If there's an arg, it's the filename
	wc=`wc -l < $file`
	count=200
	current=1
	while [ $current -le $wc ]
	do
		if [ $current -ne 1 ]
		then
			echo "Next line is $current"
		fi
		if getlines $file $current $count
		then
			current=`expr $current + $count`
		else
			saverc=$?
			echo "$0: getlines returned $saverc" 1>&2
			exit $saverc
		fi
	done

All that's left is to have flags for count and maybe the initial "current".

>->Thanks,
>->Gary
>
>You're welcome.

I'll second that!
...Kris
-- 
Kristopher Stephens, | (408-746-6047) | krs@uts.amdahl.com | KC6DFS
Amdahl Corporation   |                |                    |
     [The opinions expressed above are mine, solely, and do not    ]
     [necessarily reflect the opinions or policies of Amdahl Corp. ]