Path: utzoo!mnetor!uunet!lll-winken!lll-tis!mordor!sri-spam!sri-unix!garth!walter
From: walter@garth.UUCP (Walter Bays)
Newsgroups: comp.arch
Subject: Re: For a good time, read...
Message-ID: <592@garth.UUCP>
Date: 7 Apr 88 00:44:08 GMT
References: <7841@apple.Apple.Com>
Reply-To: walter@garth.UUCP (Walter Bays)
Organization: INTERGRAPH (APD) -- Palo Alto, CA
Lines: 143
Summary: Beware Black Box Benchmarks

In article <7841@apple.Apple.Com> bcase@Apple.COM (Brian Case) writes:
> There is a wonderful article in EE Times this week.  Starting on page 49
> and continuing on page 54, the article, entitled "CISC beats RISC in test,"
> sumarizes the results of a battery of tests performed by Neal Nealson &
> Associates.

Although there are a number of weaknesses in the Neal Nelson benchmarks,
they make a significant contribution in assessing multi-user
performance.  Even single-user workstations will usually have multiple
processes running.  Simple benchmarks like Whetstone and Dhrystone are
not by themselves good predictors of application performance in such an
environment.

If you're selecting a system, your benchmark should accurately
represent your real workload:  single user, multi-user, embedded
system, floating point, integer, array references, etc.  ONE SIZE DOES
NOT FIT ALL.  Different CPU's are better for different applications.

The Neal Nelson benchmarks were developed when CPU's were much slower,
and timing accuracy has degraded accordingly.  On fast CPU's, times
reported for one copy of the program are generally around 1 second,
plus or minus 1 second.  Neal Nelson points this out, and recommends
that results between 15 and 20 copies be compared for greater
accuracy.  These times are still often in the 10-60 second range, so
the accuracy is less than it should be.  However at 15-20 concurrently
active users - 30-200 logged users - memory, paging, and disk effects
dominate over CPU speed.

> They compared comparably-configured (say it three times fast)
                ^^^^^^^^^^ ^^^^^^^^^^
> workstations.  The SUN-3 was a 25 MHz CPU with 16 Meg of memory.  The other
> computers were the two models of the IBM RT (slug city), the Intergraph
> 32C (slightly less sluggish), the MIPS M-500, the SUN-4, and HP's 9000 and
                     ^^^^^^^^ [see published results below]
> 825.

They are _not_ comparably configured workstations.  6 MB is _not_
comparable to 32 MB when running multi-user applications.  Though
the article didn't give details of machine configurations, the
Intergraph appears to be an old model with 6 MB of memory and old
system software.  The article does not state which Sun models
were tested, but appears to be based on a Neal Nelson report comparing
a Sun 3/260 with 16 MB of memory and a 20 ms disk against a Sun 4/280
with 32 MB of memory and an 18 ms disk.  If the results were from the
less expensive Sun 4/110 which has no cache, we would expect the 4/280
to run faster.  MIPS has two models above the M-500.

Both in Intergraph workstations, and in Clipper PC-AT add-in cards, we
generally use 4-6 MB for single user machines, and 8-16 MB for several
users.  Most of Intergraph's current models come with 16-80 MB.

> The results seem to show that as the number of running processes
> goes up, the advantage of RISC drops.  The crossover point was often 12
> processes (the UNIX kernels of the RISC machines must have had a clause
> "if (procs >= 12) becomeCISC ();"  :-) :-).

The only results published in EE Times (4/4/88) were for an unspecified
benchmark, but it's probably "Test 1", a "normal" mix of calculations
and I/O:

# of simultaneous
copies
              IBM  Intergraph  MIPS           IBM     HP-9000   HP
      Sun-3   RT-25    32C    M-500   Sun-4  RT-115    /840     825
 1       2      12       2       3       2       4       4       2
 3       6      37       5       6       6      12      10       7
 5       6      65      10      10       9      20      18      11
 7      12      87      12      14      13      27      24      15
 9      15     113      17      18      17      36      31      19
11      19     135      21      22      21      44      37      24
13      23     163      24      26      25      53      44      28
15      26     192      30      29      30      62      50      32

Averaging these results for each machine gives:

              IBM  Intergraph  MIPS           IBM       HP      HP
copies Sun-3   RT-25    32C    M-500   Sun-4  RT-115    9000    825
 8      13.6   100.5    15.1    16      15.4    32.2    27.2    17.2

On this benchmark, Intergraph is the fastest of the RISC machines.
That hardly supports the characterization of it as "sluggish".  Does
this mean that Intergraph is faster than all the other RISC machines on
every workload?  Of course not!  Does it mean that the Sun-3 is faster
than all RISC machines on every workload?  Of course not!!

> On at least one test, the SUN-3 ran 18 times faster than the Intergraph!

Most likely, with many copies of a large program running, the 6 MB
Intergraph was paging itself to death, while the 16 MB Sun-3 ran in
memory.  The 24 MB HP-9000 and 32 MB Sun-4 were probably quite happy,
too.

> I recommend this article for some amusing reading!
>
> Of course, I suspect the SUN-3 kernel is highly tuned and the others are
> not as much so; also, what disk interface do these machines use?  SCSI,
> ESDI, SMD?  And, note that the SUN is running at 25 MHz, while the MIPS
> and IBM systems are running 8 and 10 MHz (6 MHz for the old model!).
> Also note that the Intergraph is running at 30 MHz!

The old 32C's, though fast for 1985, used some fairly slow disks, slow
compilers, slow I/O co-processor, and untuned kernels compared to
current models.  Also, current Intergraph Clipper C100 models run at 33
MHz.  The new Clipper C300 (3Q88) runs at 50 MHz, and has some other
internal speed-ups.

> Whoever uses the highest-speed disk interface will likely win.  So, this
> is probably less of a processor comparison than a system comparison.

Right.  It's a system comparison, and the systems are not comparably
configured.  Neal Nelson and EE Times make a valid point, that the
performance advantage of RISC generally lessens with increased user
load.  This point deserves more discussion in this forum.  There are
two main architectural reasons for this effect, both of which were
specifically addressed in the design of the Clipper.

1) A load/store architecture depends on moving heavily used variables
to registers (via optimizing compilers) or to cache memory.  Context
switches tend to flush the cache and require saving registers.  The
Clipper uses a 2-way set associative cache instead of a direct mapped
cache.  A separate register set is provided for supervisor mode so,
although you have to save registers once per context switch, you don't
have to save them twice.

2) A sliding register window provides fast subroutine calls if the
depth is not greater than the number of hardware levels and does not
change too often (perfect for Dhrystones).  However, using register
windows, context switches require an excessive number of register
saves.  (We have seen the Sun-4 run 12 times the speed of a 780 on
Dhrystones, yet slower than a 780 on context switching.)  The Clipper
uses a conventional register architecture.

> Read the article, then make comments.

You certainly manage to instigate some lively discussions on the net.  I'm
sure this one will be no exception.
-- 
------------------------------------------------------------------------------
Any similarities between my opinions and those of the
person who signs my paychecks is purely coincidental.
E-Mail route: ...!pyramid!garth!walter
USPS: Intergraph APD, 2400 Geng Road, Palo Alto, California 94303
Phone: (415) 852-2384
------------------------------------------------------------------------------