Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!csinc!rpeglar From: rpeglar@csinc.UUCP (Rob Peglar x615) Newsgroups: comp.arch Subject: Re: The Killer Micro From Hell Summary: System balance in supers - close but not quite Keywords: cpu starvation, memory bandwidth Message-ID: <159@csinc.UUCP> Date: 2 Jan 90 15:45:50 GMT References: <158@csinc.UUCP> <787@stat.fsu.edu> Organization: Control Systems, Inc., St. Paul MN Lines: 81 In article <787@stat.fsu.edu>, mccalpin@stat.fsu.edu (John Mccalpin) writes: > In article <158@csinc.UUCP> rpeglar@csinc.UUCP (Rob Peglar x615) writes: > > > >Anyway, you should carefully look at the issue of CPU starvation on some > >of the very machines you tout - like the Cray-2. Some (not all) of the > >smaller machines exhibit much less CPU starvation. The ETA-10 is (was) > >another notable example of real and potential CPU starvation as an > >architectural flaw. > > It seems odd to mention the Cray-2 and the ETA-10 in the same sentence > with regard to "CPU starvation". It seems to me that the ETA-10 is a > much more balanced design with regard to memory bandwidth -- I don't > know about I/O speeds past the shared memory, though... With the most > recent release of the operating system, we have gotten paging rates of > >500 MB/s on thrashing jobs. This is almost 1/2 of the physical I/O > bandwidth to shared memory. Earlier system software certainly left the > cpu hungry, but the hardware is capable of some pretty tremendous > bandwidth, and the software is finally starting to catch up.... Sounds like the work Chris' group (particularly JPH) is finally bearing fruit - seven months too late..... :-( McCalpin is correct about the ETA-10 being a "more" balanced design. Let's take a look at the ETA-10 from the "external" memory perspective - ignoring the "internal" (e.g. RNI) paths from 1st level store to CPU(s). Take my word for it, the internal paths from 1st level store to the CPUs are sufficient. Otherwise, multi-pipe operations would not be possible. ETA-10 shared memory (SM) (2nd level store) can feed central memory (CM) (1st level store) at the rate of one 64-bit word per clock. The CPU can compute at the rate of needing four 64-bit operands (input) per clock (2 pipes each doing M-M vector A op vector B). Assume for this case that the input operands are considered "used" after the computation, i.e. they won't be needed (ever) again. Thus, to avoid CPU starvation from the hardware perspective, the SM-->CM bandwidth is too small by a factor of four. If the "software" (OS or application) can manage its own memory correctly (i.e. four SM-->CM transfers of N words for every computation on N words) then the computation can continue at peak forever. Alas, Babylon. Peak rates are not sustainable. This problem becomes even worse if one needs third level store (typ. disk) to SM to refresh SM in a similar manner. This is excerbated in the liquid cooled machines, typically because the ratio of IOU's to SM size was too low. Current hardware can only extract about 70% of the max IOU-->SM bandwidth due to the handshaking across the IOI. Current (1.1.5) software can only get about 70% of that through the file system. E-mail me for more discussion. > > When Cray Research was founded, they estimated a world market for > supercomputers that was in the neighborhood of 40 units. Maybe they > weren't so far off after all! Probably only a factor of ten. > > Anyway, here at FSU we have been pushing the KILLER MICRO bandwagon, > too. Lets get all those !@#$%^&* scalar jobs _off_ of our vector > machines and onto the killer micros where they belong.... Then those > of us who can effectively use the vector machines will have more time > available. Amen. > > By the way, I estimate the the (soon-to-be-installed) FSU Cray > Y/MP-4/432 will only be about 125 times as fast as the new MIPS "KILLER > MICRO from HELL" on my code. Yep, they are closing the gap all right.... See the comment from Eugene Brooks. The key words, of course, are "my code" ... there are no absolute answers. Once again, the "gap" of absolute performance is there. The "gap" of price/performance, on the other hand, is now in the Killer Micro camp, for enough codes to make it interesting... John, if you want to discuss more, e-mail... Rob -- Rob Peglar Control Systems, Inc. 2675 Patton Rd., St. Paul MN 55113 ...uunet!csinc!rpeglar 612-631-7800 The posting above does not necessarily represent the policies of my employer.