Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!sun-barr!decwrl!amdcad!proton!davec
From: davec@proton.amd.com (Dave Christie)
Newsgroups: comp.arch
Subject: Re: KM's vs. Supers (medium)
Summary: Can't exclude load time from performance comparisons
Message-ID: <28693@amdcad.AMD.COM>
Date: 8 Jan 90 20:38:42 GMT
References: <34030@mips.mips.COM> <4322@nttmhs.ntt.JP> <39807@ames.arc.nasa.gov> <4328@scolex.sco.COM>
Sender: news@amdcad.AMD.COM
Reply-To: davec@proton.amd.com (Dave Christie)
Organization: Advanced Micro Devices, Inc., Austin, Texas
Lines: 51
Expires:
Sender:
Followup-To:

In article <4328@scolex.sco.COM> seanf@sco.COM (Sean Fagan) writes:
[Trying to understand how a 66MHz KM could possibly beat a 250MHz super]
>
>Ok.  I've just sat down with a list, and tried to figure something out;
>hopefully, *somebody* can help me on this.  My "list" was a list of timings
>for a CDC Cyber 170/760, running at 40MHz (thanks Brian!).  From experience,
>I'd say that the list is correct (i.e., not a lie 8-)).  Excluding loads,
                                                          ^^^^^^^^^^^^^^^
>divides, and other memory references, average cycle count is between 2 and 3
>clocks / instruction.
 [...]
>according to Eugene Brooks, a 66MHz R6000 will outperform a Cray-2, which
>runs at, what, 250MHz?
>
>So, *how* does it do this?  The things I could come up with were:  the

First of all, one simply can't exclude loads in coming up with a meaningful
cpi figure (divides, maybe).  And it is loads that probably make the most
difference in a KM/super comparison (specifically, loads that can't be
initiated well ahead of when the data is needed).  I don't know the memory
access time on a 760 off hand, but it will certainly be several 25ns clocks
(the 760 has no cache).  I'm willing to bet Cray-2 memory access is nowhere
near the same number of 4ns clocks; in fact I wouldn't be suprised if total
memory access time were longer on the Cray-2 (damn, I hate speculating
without the facts, but I think my point is still valid).  Considering that
loads tend to amount to 25-30% of instructions (if I may generalize), both
of these machines will spend a large amount of time simply waiting for
memory access (on certain codes) and since the memory access time doesn't
scale at 250/40, overall performance won't.  (Warning: this gross
generalization doesn't include other aspects such as branches and is 
merely for illustrative purposes.)  However, KMs, having caches, tend
to have much shorter effective memory access times (on many codes) and
so eliminate much of the time spent on loads, which could amount to 
something like 75% of the overall time spent on a problem.  So a 66MHz
KM could easily beat a 125MHz super (I believe the minimum instruction 
time on a Cray-2 is two cycles).  

Remember, this demonstration is only valid for scalar codes where loads 
cannot be initiated well ahead of time.  Which is precisely the sort of
thing that RISC architectures are optimized for.  Supers, on the other
hand, have a much different heritage - numeric codes with lots of
parallelism.  So super-bashing really isn't warrented - its largely an
apples-to-oranges comparison.

>-- 
>Sean Eric Fagan  | "Time has little to do with infinity and jelly donuts."
>seanf@sco.COM    |    -- Thomas Magnum (Tom Selleck), _Magnum, P.I._
>(408) 458-1422   | Any opinions expressed are my own, not my employers'.

------------
Dave Christie            My opinions only, not my employers