Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!convex!tchrist@convex.COM
From: tchrist@convex.COM (Tom Christiansen)
Newsgroups: comp.lang.perl
Subject: Re: speed: V2 verses V3
Message-ID: <4055@convex.UUCP>
Date: 18 Dec 89 05:18:27 GMT
References: <1808@uvaarpa.virginia.edu> <6609@jpl-devvax.JPL.NASA.GOV> <4047@convex.UUCP> <1989Dec18.032836.16434@psuvax1.cs.psu.edu>
Sender: usenet@convex.UUCP
Reply-To: tchrist@convex.COM (Tom Christiansen)
Organization: CONVEX Software Development, Richardson, TX
Lines: 57

In article <1989Dec18.032836.16434@psuvax1.cs.psu.edu> schwartz@psuvax1.cs.psu.edu (Scott Schwartz) writes:

>I think this benchmark is not computationally expensive enough
>to give good results.  One second of runtime tells nothing, really.

Basically all true.  That's why I ran I picked an entry 2000 lines into
the file and then passed it the same argument 3 times, which you didn't do
-- you made it only look once.  I chose it because it did a variety of
text things, like regular expression matching and substitutions and
concatenation.  I made it run on a big file and go through several passes
of the same file to make it run long enough to make the results useful.
I think a bigger problem with it is that we don't all have the same
termcap files.  

Here's another termcap benchmark, which exercises split() and associative
arrays.  It also shows about a 1/3 speedup going from perl2 to perl3.

All runs produce this output:
    saw 1365 entries on 2235 lines, 15 duplicates

Here are the timings (both machines had 128meg):
    c1% time perl2 tcount.pl < /etc/termcap  > /dev/null
    4.7u 0.7s 0:05 96% 0+6k 0+1io 150pf+0w
    c1% time perl3 tcount.pl < /etc/termcap  > /dev/null
    3.2u 0.4s 0:03 96% 0+8k 0+2io 132pf+0w
    c2% time perl2 tcount.pl < /etc/termcap  > /dev/null
    1.4u 0.3s 0:01 94% 0+0k 1+1io 137pf+0w
    c2% time perl3 tcount.pl < /etc/termcap  > /dev/null
    0.9u 0.2s 0:01 94% 0+0k 0+1io 116pf+0w

And this was the program:
    #!/usr/bin/perl
    while (<>) {
	$lines++;
	next if /^[#\s]/;
	chop;
	s/:.*//;
	split(/\|/);
	for (@_) {
	    $count++;
	    $seen{$_}++;
	} 
    } 
    @keys = keys(seen);
    printf "saw %d entries on %d lines, %d duplicates\n",
	    $count, $lines, $count - $#keys;

Scott may not like it either because it also runs too quickly.  Anybody want
to post a better benchmark?  I'm having trouble finding something that'll
actually run for a long time.  My cfman program does, but it's totally 
unsuitable as a benchmark because everyone has different man pages.

--tom

    Tom Christiansen                       {uunet,uiucdcs,sun}!convex!tchrist 
    Convex Computer Corporation                            tchrist@convex.COM
		 "EMACS belongs in <sys/errno.h>: Editor too big!"