Path: utzoo!utgpu!attcan!uunet!seismo!sundc!pitstop!sun!amdcad!ames!mailrus!ncar!noao!arizona!naucse!sbw
From: sbw@naucse.UUCP (Steve Wampler)
Newsgroups: comp.arch
Subject: Re: A request... (summary of responses - long)
Message-ID: <1004@naucse.UUCP>
Date: 3 Nov 88 00:54:05 GMT
Organization: Northern Arizona University, Flagstaff, AZ
Lines: 143

At the request of others, here is a summary of the performance
measures that I've received (so far) from other people on the
net.  Let me start with a comment:

	ANYONE who uses these as realistic benchmarks should be
	laughed off the net.

These program test various algorithms/implementations for a
very specialized test case.  They might provide some insight,
but there are far better performance measures out there.

Also, in retrospect, the file being searched is simply too
small to provide accurate measures on the more interesting
machines.  It would be fairly easy to modify the file
creation program to produce a file 10 times larger, but I
cannot see asking people to donate 2.3MB of disk for this task.

There were a few people who offered to help that I am unable
to reach, for various reasons (one person will apparently get
the source file sometime in the next 23 days, as near as I
can tell from the messages his hosts mailer daemon sends me).
I would like to thank you, and apologize for not being able
to contact your more personably.

The results are given here in tbl-troff source form.  If you
want to look at them, and don't have tbl and/or troff, you
might try to deduce the results by examining this file.

My thanks to all the people who responded.  I know some of
you took a fair amount of time to get times for your machines.
If I can return the favor (not likely - my time is the 3B1!)
I'll see what I can do.

--- snip "Results.t" ---
.TL
Performance Measurements
.SH
Introduction
.LP
The following table gives the raw timings for several related
programs on a variety of computers.
Times are reported cpu times spent in user code.
.LP
The four test programs are (in order of appearance in the table):
.IP "fgr" 1i
\f(TTfgr\fR is a special case version of \fIfgrep\fR supplied by an
unnamed computer manufacturer.
It prints out the time spent in the search portion of its
code, as returned by the function \f(TTclock()\fR.
.IP "fgrep" 1i
\f(TTfgrep\fR is the \fIfgrep\fR program as found on the measured
machine.
It is invoked with the \f(TT-c\fR option, searching for
\f(TTkataveni\fR in a data file equivalent to the one built
internally by \f(TTfgr\fR.
.IP "grep" 1i
\f(TTgrep\fR is the \fIgrep\fR program as found on the measured
machine.
It is invoked with the same arguments as \f(TTfgrep\fR.
.IP "ff" 1i
\f(TTff\fR is the implementation of the Boyer-Moore algorithm
from the book \fI"Software Tools"\fR by Webb Miller.
The only modification was to add support for a \f(TT-c\fR option.
It is invoked with the same arguments as \f(TTfgrep\fR.
.LP
In most cases, values are averaged over three or more runs, the
only exceptions are with the \fIAM29000\fR where the times are derived
from counting the clock ticks in the simulator.
Times are only given for the configuration of hardware/operating
system/compiler that proved fastest for a given machine,
for example, \f(TTgcc\fR produced slightly worse code on the \fISun\fR
systems than the vendor supplied compiler.
The first number for \f(TTfgr\fR is the time returned by \f(TTclock()\fR,
reported in seconds.
The second number is the time for the entire run, as reported by time.
.TS
center tab(:) ;
c c s c s c s c s
l | n l | n l | n l | n l | .
\fBMachine\fR:\fBfgr\fR:\fBfgrep\fR:\fBgrep\fR:\fBff\fR
:=:=:=:=:=:=:=:=
\fIAM29000\fR:(0.023):0.11::-::-::(0.02)
:_:_:_:_:_:_:_:_
\fIATT 3B1\fR:(0.877):1.96::(18.84)::(2.09)::(0.78)
:_:_:_:_:_:_:_:_
\fIATT 3B2/400\fR:(1.480):2.38::(7.09)::(4.12)::(0.36)
:_:_:_:_:_:_:_:_
\fICray II\fR:(1.233):1.40::(1.68)::(0.31)::(0.05)
:_:_:_:_:_:_:_:_
\fICray X-MP\fR:(0.162):0.27::(0.75)::(0.37)::(0.03)
:_:_:_:_:_:_:_:_
\fIDEC uVAX-II\fR:(1.127):2.03::(4.80)::(3.57)::(0.40)
:_:_:_:_:_:_:_:_
\fIDEC uVAX-III\fR:(0.460):0.77::(1.77)::(1.47)::(0.10)
:_:_:_:_:_:_:_:_
\fIEncore Multimax\fR:(0.806):1.40::(3.90)::(1.90)::(0.20)
:_:_:_:_:_:_:_:_
\fIGould PN9050\fR:(0.377):0.57::(1.33)::(1.10)::(0.07)
:_:_:_:_:_:_:_:_
\fIMIPS M/1000\fR:(0.150):0.29::(0.66)::(0.30)::(0.04)
:_:_:_:_:_:_:_:_
\fIMIPS M/2000\fR:(0.080):0.16::(0.40)::(0.16)::(0.04)
:_:_:_:_:_:_:_:_
\fISGI 3030\fR:(0.483):0.77::(2.77)::(1.87)::(0.17)
:_:_:_:_:_:_:_:_
\fISun 2/50\fR:(1.077):2.17::(7.07)::(6.63)::(0.67)
:_:_:_:_:_:_:_:_
\fISun 3/60\fR:(0.377):0.67::(2.27)::(1.60)::(0.17)
:_:_:_:_:_:_:_:_
\fISun 3/140\fR:(0.516):0.87::(3.13)::(2.30)::(0.23)
:_:_:_:_:_:_:_:_
\fISun 3/280\fR:(0.288):0.47::(1.47)::(1.07)::(0.17)
:_:_:_:_:_:_:_:_
\fISun 4/110\fR:(0.256):0.40::(1.33)::(1.00)::(0.10)
:_:_:_:_:_:_:_:_
\fISun 4/260\fR:(0.178):0.27::(0.80)::(0.80)::(0.00)
:_:_:_:_:_:_:_:_
.TE
.LP
A few comments:
.IP (1) 0.5i
I suspect that, on the faster machines, some of the
programs execute too quickly to be accurately measured.
For example, I doubt that the \fIMIPS M/1000\fR really
executes \f(TTff\fR as fast as the \fIMIPS M/2000\fR does.
Nor do I believe that the \fISun 4/260\fR is really instantaneous on
\f(TTff\fR.
The \fICRAY\fRs have more accuracy in their output from 'time'.
.IP (2) 0.5i
The \fIEncore Multimax\fR is a parallel machine with
8 68020s (each running at about 20MHz).
However, the compiler doesn't try to parallelize code unless
it is told to do so, so most of the times are closer to that
of a single 68020.
.IP (3) 0.5i
No one should take these times as definitive.
There are nuances among the machines that are not reported here.
Some (\fInot all\fR) examples are; the \fICRAY-II\fR used is not the fastest
\fICRAY-II\fR; the \fICRAY X-MP\fR was able to vectorise some code not
vectorised by the \fICRAY-II\fR (different versions of the compiler); etc.
-- 
	Steve Wampler
	{....!arizona!naucse!sbw}