Path: utzoo!mnetor!uunet!lll-winken!lll-tis!mordor!sri-spam!sri-unix!garth!walter From: walter@garth.UUCP (Walter Bays) Newsgroups: comp.arch Subject: Re: For a good time, read... Message-ID: <592@garth.UUCP> Date: 7 Apr 88 00:44:08 GMT References: <7841@apple.Apple.Com> Reply-To: walter@garth.UUCP (Walter Bays) Organization: INTERGRAPH (APD) -- Palo Alto, CA Lines: 143 Summary: Beware Black Box Benchmarks In article <7841@apple.Apple.Com> bcase@Apple.COM (Brian Case) writes: > There is a wonderful article in EE Times this week. Starting on page 49 > and continuing on page 54, the article, entitled "CISC beats RISC in test," > sumarizes the results of a battery of tests performed by Neal Nealson & > Associates. Although there are a number of weaknesses in the Neal Nelson benchmarks, they make a significant contribution in assessing multi-user performance. Even single-user workstations will usually have multiple processes running. Simple benchmarks like Whetstone and Dhrystone are not by themselves good predictors of application performance in such an environment. If you're selecting a system, your benchmark should accurately represent your real workload: single user, multi-user, embedded system, floating point, integer, array references, etc. ONE SIZE DOES NOT FIT ALL. Different CPU's are better for different applications. The Neal Nelson benchmarks were developed when CPU's were much slower, and timing accuracy has degraded accordingly. On fast CPU's, times reported for one copy of the program are generally around 1 second, plus or minus 1 second. Neal Nelson points this out, and recommends that results between 15 and 20 copies be compared for greater accuracy. These times are still often in the 10-60 second range, so the accuracy is less than it should be. However at 15-20 concurrently active users - 30-200 logged users - memory, paging, and disk effects dominate over CPU speed. > They compared comparably-configured (say it three times fast) ^^^^^^^^^^ ^^^^^^^^^^ > workstations. The SUN-3 was a 25 MHz CPU with 16 Meg of memory. The other > computers were the two models of the IBM RT (slug city), the Intergraph > 32C (slightly less sluggish), the MIPS M-500, the SUN-4, and HP's 9000 and ^^^^^^^^ [see published results below] > 825. They are _not_ comparably configured workstations. 6 MB is _not_ comparable to 32 MB when running multi-user applications. Though the article didn't give details of machine configurations, the Intergraph appears to be an old model with 6 MB of memory and old system software. The article does not state which Sun models were tested, but appears to be based on a Neal Nelson report comparing a Sun 3/260 with 16 MB of memory and a 20 ms disk against a Sun 4/280 with 32 MB of memory and an 18 ms disk. If the results were from the less expensive Sun 4/110 which has no cache, we would expect the 4/280 to run faster. MIPS has two models above the M-500. Both in Intergraph workstations, and in Clipper PC-AT add-in cards, we generally use 4-6 MB for single user machines, and 8-16 MB for several users. Most of Intergraph's current models come with 16-80 MB. > The results seem to show that as the number of running processes > goes up, the advantage of RISC drops. The crossover point was often 12 > processes (the UNIX kernels of the RISC machines must have had a clause > "if (procs >= 12) becomeCISC ();" :-) :-). The only results published in EE Times (4/4/88) were for an unspecified benchmark, but it's probably "Test 1", a "normal" mix of calculations and I/O: # of simultaneous copies IBM Intergraph MIPS IBM HP-9000 HP Sun-3 RT-25 32C M-500 Sun-4 RT-115 /840 825 1 2 12 2 3 2 4 4 2 3 6 37 5 6 6 12 10 7 5 6 65 10 10 9 20 18 11 7 12 87 12 14 13 27 24 15 9 15 113 17 18 17 36 31 19 11 19 135 21 22 21 44 37 24 13 23 163 24 26 25 53 44 28 15 26 192 30 29 30 62 50 32 Averaging these results for each machine gives: IBM Intergraph MIPS IBM HP HP copies Sun-3 RT-25 32C M-500 Sun-4 RT-115 9000 825 8 13.6 100.5 15.1 16 15.4 32.2 27.2 17.2 On this benchmark, Intergraph is the fastest of the RISC machines. That hardly supports the characterization of it as "sluggish". Does this mean that Intergraph is faster than all the other RISC machines on every workload? Of course not! Does it mean that the Sun-3 is faster than all RISC machines on every workload? Of course not!! > On at least one test, the SUN-3 ran 18 times faster than the Intergraph! Most likely, with many copies of a large program running, the 6 MB Intergraph was paging itself to death, while the 16 MB Sun-3 ran in memory. The 24 MB HP-9000 and 32 MB Sun-4 were probably quite happy, too. > I recommend this article for some amusing reading! > > Of course, I suspect the SUN-3 kernel is highly tuned and the others are > not as much so; also, what disk interface do these machines use? SCSI, > ESDI, SMD? And, note that the SUN is running at 25 MHz, while the MIPS > and IBM systems are running 8 and 10 MHz (6 MHz for the old model!). > Also note that the Intergraph is running at 30 MHz! The old 32C's, though fast for 1985, used some fairly slow disks, slow compilers, slow I/O co-processor, and untuned kernels compared to current models. Also, current Intergraph Clipper C100 models run at 33 MHz. The new Clipper C300 (3Q88) runs at 50 MHz, and has some other internal speed-ups. > Whoever uses the highest-speed disk interface will likely win. So, this > is probably less of a processor comparison than a system comparison. Right. It's a system comparison, and the systems are not comparably configured. Neal Nelson and EE Times make a valid point, that the performance advantage of RISC generally lessens with increased user load. This point deserves more discussion in this forum. There are two main architectural reasons for this effect, both of which were specifically addressed in the design of the Clipper. 1) A load/store architecture depends on moving heavily used variables to registers (via optimizing compilers) or to cache memory. Context switches tend to flush the cache and require saving registers. The Clipper uses a 2-way set associative cache instead of a direct mapped cache. A separate register set is provided for supervisor mode so, although you have to save registers once per context switch, you don't have to save them twice. 2) A sliding register window provides fast subroutine calls if the depth is not greater than the number of hardware levels and does not change too often (perfect for Dhrystones). However, using register windows, context switches require an excessive number of register saves. (We have seen the Sun-4 run 12 times the speed of a 780 on Dhrystones, yet slower than a 780 on context switching.) The Clipper uses a conventional register architecture. > Read the article, then make comments. You certainly manage to instigate some lively discussions on the net. I'm sure this one will be no exception. -- ------------------------------------------------------------------------------ Any similarities between my opinions and those of the person who signs my paychecks is purely coincidental. E-Mail route: ...!pyramid!garth!walter USPS: Intergraph APD, 2400 Geng Road, Palo Alto, California 94303 Phone: (415) 852-2384 ------------------------------------------------------------------------------