Path: utzoo!utgpu!attcan!uunet!seismo!sundc!pitstop!sun!amdcad!ames!mailrus!ncar!noao!arizona!naucse!sbw From: sbw@naucse.UUCP (Steve Wampler) Newsgroups: comp.arch Subject: Re: A request... (summary of responses - long) Message-ID: <1004@naucse.UUCP> Date: 3 Nov 88 00:54:05 GMT Organization: Northern Arizona University, Flagstaff, AZ Lines: 143 At the request of others, here is a summary of the performance measures that I've received (so far) from other people on the net. Let me start with a comment: ANYONE who uses these as realistic benchmarks should be laughed off the net. These program test various algorithms/implementations for a very specialized test case. They might provide some insight, but there are far better performance measures out there. Also, in retrospect, the file being searched is simply too small to provide accurate measures on the more interesting machines. It would be fairly easy to modify the file creation program to produce a file 10 times larger, but I cannot see asking people to donate 2.3MB of disk for this task. There were a few people who offered to help that I am unable to reach, for various reasons (one person will apparently get the source file sometime in the next 23 days, as near as I can tell from the messages his hosts mailer daemon sends me). I would like to thank you, and apologize for not being able to contact your more personably. The results are given here in tbl-troff source form. If you want to look at them, and don't have tbl and/or troff, you might try to deduce the results by examining this file. My thanks to all the people who responded. I know some of you took a fair amount of time to get times for your machines. If I can return the favor (not likely - my time is the 3B1!) I'll see what I can do. --- snip "Results.t" --- .TL Performance Measurements .SH Introduction .LP The following table gives the raw timings for several related programs on a variety of computers. Times are reported cpu times spent in user code. .LP The four test programs are (in order of appearance in the table): .IP "fgr" 1i \f(TTfgr\fR is a special case version of \fIfgrep\fR supplied by an unnamed computer manufacturer. It prints out the time spent in the search portion of its code, as returned by the function \f(TTclock()\fR. .IP "fgrep" 1i \f(TTfgrep\fR is the \fIfgrep\fR program as found on the measured machine. It is invoked with the \f(TT-c\fR option, searching for \f(TTkataveni\fR in a data file equivalent to the one built internally by \f(TTfgr\fR. .IP "grep" 1i \f(TTgrep\fR is the \fIgrep\fR program as found on the measured machine. It is invoked with the same arguments as \f(TTfgrep\fR. .IP "ff" 1i \f(TTff\fR is the implementation of the Boyer-Moore algorithm from the book \fI"Software Tools"\fR by Webb Miller. The only modification was to add support for a \f(TT-c\fR option. It is invoked with the same arguments as \f(TTfgrep\fR. .LP In most cases, values are averaged over three or more runs, the only exceptions are with the \fIAM29000\fR where the times are derived from counting the clock ticks in the simulator. Times are only given for the configuration of hardware/operating system/compiler that proved fastest for a given machine, for example, \f(TTgcc\fR produced slightly worse code on the \fISun\fR systems than the vendor supplied compiler. The first number for \f(TTfgr\fR is the time returned by \f(TTclock()\fR, reported in seconds. The second number is the time for the entire run, as reported by time. .TS center tab(:) ; c c s c s c s c s l | n l | n l | n l | n l | . \fBMachine\fR:\fBfgr\fR:\fBfgrep\fR:\fBgrep\fR:\fBff\fR :=:=:=:=:=:=:=:= \fIAM29000\fR:(0.023):0.11::-::-::(0.02) :_:_:_:_:_:_:_:_ \fIATT 3B1\fR:(0.877):1.96::(18.84)::(2.09)::(0.78) :_:_:_:_:_:_:_:_ \fIATT 3B2/400\fR:(1.480):2.38::(7.09)::(4.12)::(0.36) :_:_:_:_:_:_:_:_ \fICray II\fR:(1.233):1.40::(1.68)::(0.31)::(0.05) :_:_:_:_:_:_:_:_ \fICray X-MP\fR:(0.162):0.27::(0.75)::(0.37)::(0.03) :_:_:_:_:_:_:_:_ \fIDEC uVAX-II\fR:(1.127):2.03::(4.80)::(3.57)::(0.40) :_:_:_:_:_:_:_:_ \fIDEC uVAX-III\fR:(0.460):0.77::(1.77)::(1.47)::(0.10) :_:_:_:_:_:_:_:_ \fIEncore Multimax\fR:(0.806):1.40::(3.90)::(1.90)::(0.20) :_:_:_:_:_:_:_:_ \fIGould PN9050\fR:(0.377):0.57::(1.33)::(1.10)::(0.07) :_:_:_:_:_:_:_:_ \fIMIPS M/1000\fR:(0.150):0.29::(0.66)::(0.30)::(0.04) :_:_:_:_:_:_:_:_ \fIMIPS M/2000\fR:(0.080):0.16::(0.40)::(0.16)::(0.04) :_:_:_:_:_:_:_:_ \fISGI 3030\fR:(0.483):0.77::(2.77)::(1.87)::(0.17) :_:_:_:_:_:_:_:_ \fISun 2/50\fR:(1.077):2.17::(7.07)::(6.63)::(0.67) :_:_:_:_:_:_:_:_ \fISun 3/60\fR:(0.377):0.67::(2.27)::(1.60)::(0.17) :_:_:_:_:_:_:_:_ \fISun 3/140\fR:(0.516):0.87::(3.13)::(2.30)::(0.23) :_:_:_:_:_:_:_:_ \fISun 3/280\fR:(0.288):0.47::(1.47)::(1.07)::(0.17) :_:_:_:_:_:_:_:_ \fISun 4/110\fR:(0.256):0.40::(1.33)::(1.00)::(0.10) :_:_:_:_:_:_:_:_ \fISun 4/260\fR:(0.178):0.27::(0.80)::(0.80)::(0.00) :_:_:_:_:_:_:_:_ .TE .LP A few comments: .IP (1) 0.5i I suspect that, on the faster machines, some of the programs execute too quickly to be accurately measured. For example, I doubt that the \fIMIPS M/1000\fR really executes \f(TTff\fR as fast as the \fIMIPS M/2000\fR does. Nor do I believe that the \fISun 4/260\fR is really instantaneous on \f(TTff\fR. The \fICRAY\fRs have more accuracy in their output from 'time'. .IP (2) 0.5i The \fIEncore Multimax\fR is a parallel machine with 8 68020s (each running at about 20MHz). However, the compiler doesn't try to parallelize code unless it is told to do so, so most of the times are closer to that of a single 68020. .IP (3) 0.5i No one should take these times as definitive. There are nuances among the machines that are not reported here. Some (\fInot all\fR) examples are; the \fICRAY-II\fR used is not the fastest \fICRAY-II\fR; the \fICRAY X-MP\fR was able to vectorise some code not vectorised by the \fICRAY-II\fR (different versions of the compiler); etc. -- Steve Wampler {....!arizona!naucse!sbw}