Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site ames.UUCP Path: utzoo!watmath!clyde!burl!ulysses!bellcore!decvax!hplabs!ames!eugene From: eugene@ames.UUCP (Eugene Miya) Newsgroups: net.arch Subject: Re: CRAY Question (also MIP/Mhz) Message-ID: <1496@ames.UUCP> Date: Thu, 1-May-86 12:08:53 EDT Article-I.D.: ames.1496 Posted: Thu May 1 12:08:53 1986 Date-Received: Sat, 3-May-86 19:50:03 EDT References: <905@harvard.UUCP> Distribution: net Organization: NASA-Ames Research Center, Mtn. View, CA Lines: 47 > In one of Jack Dongarra's articles on LINPACK performance > (Computer Archicture News, vol 11, no 5 (Dec 83)), he says that a > CRAY 1-M executes the benchmark faster than a CRAY 1-S because the > 1-M has slower memory. I fail to see how it > is even theoretically possible for slower memory to mean higher performance, > and would appreciate someone who knows about CRAY's explaining this to me > (Dongarra talks about a "missed chain-slot"). > Ehud Reiter George Spix from Cray Research should be able to this (gas@lanl), but I've not seen him lately, so I'm give it a shot. First point of clarification. Technically, there are no more Cray-1Ms anymore. We had one here, and it was redesignated a Cray X-MP/1. This is a machine which is moving to UC Berkeley Next month. Second, you should realize that X-MPs represent cleaned up Cray-1's (not 1S). They have a faster cycle time: 9.5 ns versus 12.5 ns, they have vector chaining, (I assume you know what chaining is, otherwise check an architecture book, you message did not sound like a specific request for chaining description), they have three paths between memory and CPU rather than one. The X-MP/1 has a slower MOS rather than bipolar memory which comes with 'top of the line' (read: current fastest model Xs, the 2 is MOS, and it also only has one data path to a given quadrant of memory). Lastly, machines like these are not like micros and minis in that you really tune them for the slowness of memory (any memory): delay loops are unacceptable. You count clock periods and make architectural features to compensate for them (i.e. chaining). It takes four clocks to get a word of memory into a CPU (assuming no bank contention). I have also been told by a Cray site engineer here that the newer MOS memories also have a slightly different internal organization. This is all why I pointed out to the fellow at NC that the MIPs/Mhz thing has a von Neumann bottleneck problem (via mail). Lastly, I have measured the effects of why this has happened, and I posted this to the Net over a year ago, but in cleaning my old author_copy file, I decided to remove it (I included a graph in that posting). Aside the X-MP, also has a nice hardware box know as the Hardware Performance Monitor which does instruction counts non-obtusively (another reason why a VAX is a poor machine to do performance research on). From the Rock of Ages Home for Retired Hackers: --eugene miya NASA Ames Research Center com'on do you trust Reply commands with all these different mailers? {hplabs,ihnp4,dual,hao,decwrl,tektronix,allegra}!ames!aurora!eugene eugene@ames-aurora.ARPA