Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!sun-barr!decwrl!amdcad!proton!davec From: davec@proton.amd.com (Dave Christie) Newsgroups: comp.arch Subject: Re: KM's vs. Supers (medium) Summary: Can't exclude load time from performance comparisons Message-ID: <28693@amdcad.AMD.COM> Date: 8 Jan 90 20:38:42 GMT References: <34030@mips.mips.COM> <4322@nttmhs.ntt.JP> <39807@ames.arc.nasa.gov> <4328@scolex.sco.COM> Sender: news@amdcad.AMD.COM Reply-To: davec@proton.amd.com (Dave Christie) Organization: Advanced Micro Devices, Inc., Austin, Texas Lines: 51 Expires: Sender: Followup-To: In article <4328@scolex.sco.COM> seanf@sco.COM (Sean Fagan) writes: [Trying to understand how a 66MHz KM could possibly beat a 250MHz super] > >Ok. I've just sat down with a list, and tried to figure something out; >hopefully, *somebody* can help me on this. My "list" was a list of timings >for a CDC Cyber 170/760, running at 40MHz (thanks Brian!). From experience, >I'd say that the list is correct (i.e., not a lie 8-)). Excluding loads, ^^^^^^^^^^^^^^^ >divides, and other memory references, average cycle count is between 2 and 3 >clocks / instruction. [...] >according to Eugene Brooks, a 66MHz R6000 will outperform a Cray-2, which >runs at, what, 250MHz? > >So, *how* does it do this? The things I could come up with were: the First of all, one simply can't exclude loads in coming up with a meaningful cpi figure (divides, maybe). And it is loads that probably make the most difference in a KM/super comparison (specifically, loads that can't be initiated well ahead of when the data is needed). I don't know the memory access time on a 760 off hand, but it will certainly be several 25ns clocks (the 760 has no cache). I'm willing to bet Cray-2 memory access is nowhere near the same number of 4ns clocks; in fact I wouldn't be suprised if total memory access time were longer on the Cray-2 (damn, I hate speculating without the facts, but I think my point is still valid). Considering that loads tend to amount to 25-30% of instructions (if I may generalize), both of these machines will spend a large amount of time simply waiting for memory access (on certain codes) and since the memory access time doesn't scale at 250/40, overall performance won't. (Warning: this gross generalization doesn't include other aspects such as branches and is merely for illustrative purposes.) However, KMs, having caches, tend to have much shorter effective memory access times (on many codes) and so eliminate much of the time spent on loads, which could amount to something like 75% of the overall time spent on a problem. So a 66MHz KM could easily beat a 125MHz super (I believe the minimum instruction time on a Cray-2 is two cycles). Remember, this demonstration is only valid for scalar codes where loads cannot be initiated well ahead of time. Which is precisely the sort of thing that RISC architectures are optimized for. Supers, on the other hand, have a much different heritage - numeric codes with lots of parallelism. So super-bashing really isn't warrented - its largely an apples-to-oranges comparison. >-- >Sean Eric Fagan | "Time has little to do with infinity and jelly donuts." >seanf@sco.COM | -- Thomas Magnum (Tom Selleck), _Magnum, P.I._ >(408) 458-1422 | Any opinions expressed are my own, not my employers'. ------------ Dave Christie My opinions only, not my employers