Path: utzoo!censor!geac!torsqnt!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!newstop!texsun!letni!mic!convex!convex.COM
From: tchrist@convex.COM (Tom Christiansen)
Newsgroups: comp.lang.perl
Subject: Re: Fast way to join lines ?
Summary: Dan's C version isn't worth the bother
Message-ID: <109688@convex.convex.com>
Date: 2 Dec 90 03:47:11 GMT
References: <9830001@hpfcso.HP.COM> <2967:Dec122:39:3790@kramden.acf.nyu.edu>
Sender: usenet@convex.com
Reply-To: tchrist@convex.COM (Tom Christiansen)
Organization: CONVEX Software Development, Richardson, TX
Lines: 106

In article <2967:Dec122:39:3790@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
:In article <9830001@hpfcso.HP.COM> hai@hpfcso.HP.COM (Hai Vo-Ba) writes:
:> 	I am using perl to join every N lines of a very large file
:>     together and wonder what is the faster way to do this:

(code restored for further reference)
:>
:>      $\ = "\n";              # set output record separator
:>
:>      while (<>) {
:>          chop;       # strip record separator
:>          $line .= $_;
:>          if (($. % 32) == 0) {
:>              print $line;
:>              $line = '';
:>          }
:>      }
:>
:>      if ($line ne '') { print $line; }

Dan then writes:

:The faster way is something like this:
:
:#include <stdio.h>
:main()
:{
: int ch; int t = 33;
: while ((ch = getchar()) != EOF)
:  {
:   if (ch == '\n') if (--t) continue; else t = 33;
:   putchar(ch);
:  }
:}

Well, I'm afraid we've got just a few problems here.  

The first one is that the C code doesn't do what the Perl code does, and
the poster requested a faster way to do the same thing.  The Perl construct
"while (<>)" is not equivalent to "while (<STDIN>)".  The construct used
by the original poster will traverse its command line argument list and
treat it as one continuous input stream, correctly processing any "-"
arguments, and defaulting to stdin if no arguments are given.  Dan's code
only consults stdin, so it's not as functional.

The second problem is that (as I mentioned before) while it's good to
maintain perspective of using the right tool for the job at hand, this
*IS* comp.lang.perl, and the poster seemed to be clearly searching for a
perlian solution to his problem.  How do you, Dan, know that this wasn't
just a code fragment extracted for demonstration purposes from a larger
program of the posters?

Look at it this way:  if I hung around comp.lang.c and kept posting Perl
solutions to people's C questions, it would eventually grate on people's
nerves.  A non-productive flame war would start up that would waste
net bandwidth, the readers' time, and just generally rain on everyone's
parade unnecessarily.  We've had a very flame-free, productive little
group here since its inception, so let's keep it that way, OK?

The third problem is that the poster asked for a faster way.  There are
several interpretations of faster, including but not limited to faster
writing time, faster compile time, faster debugging time, and faster run
time.  First let me offer a faster Perl version of the poster's original
code:

    while (<>) {
	chop if $. % 32;
	print;
    }

If this doesn't need to be part of another program, you might as
well just do it this way:

    perl -pe 'chop if $. % 32'

or else

    perl -pe 'chop if $. & 31'

Now, let's first talk run time here.  On my 2250-line termcap file, Dan's
C program (which you'll recall doesn't do all that the Perl one does) runs
in this much time:

	0.450524 real        0.340401 user        0.054859 sys

whereas my Perl one-liner runs in just this much time:

	0.684193 real        0.450535 user        0.083110 sys

I find that pretty respectable; I don't think we're going to quibble about
a couple seconds, let alone eleven hundredths of a second of user time.

[ I probably shouldn't even mention that mine if we eat the whitespace, mine
can be reduced to 12 bytes, and Dan's to 130, but I just did anyway. :-/ ]

As far as I'm concerned, and I'll bet you this goes for most of the rest
of the readership of this newsgroup as well, anything that you can express
as a quick one-liner without having to go into an editor (let alone
compile an a.out!) is worth doing that way.  Those 0.11 seconds of user
time you lost on the run is more than made up for in how fast it took you
to write and run the Perl code.  Furthermore, it's a lot more legible
because its complexity is drastically reduced, which means it'll be more
maintainable as well.


--tom


Brought to you by Super Global Mega Corp .com