Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site ames.UUCP Path: utzoo!watmath!clyde!burl!ulysses!gamma!epsilon!zeta!sabre!petrus!bellcore!decvax!ucbvax!ucdavis!lll-crg!dual!ames!eugene From: eugene@ames.UUCP (Eugene Miya) Newsgroups: net.arch Subject: Re: Cray-2 impressions (longer) Message-ID: <1203@ames.UUCP> Date: Wed, 16-Oct-85 13:23:01 EDT Article-I.D.: ames.1203 Posted: Wed Oct 16 13:23:01 1985 Date-Received: Sat, 19-Oct-85 07:39:30 EDT References: <1189@ames.UUCP> <224@well.UUCP> <919@lll-crg.ARpA> Distribution: net Organization: NASA-Ames Research Center, Mtn. View, CA Lines: 94 I posted my original message after returning from the CUG meeting at a moment of sweeping beauty upon seeing the C-2. Truly a sight to behold. > > Is it true that the CRAY-2 cpu is really a CRAY-1 (not an X-MP) > >cpu, meaning that it has only one path to memory and doesnt do > >chaining? > Yes, it true. Agreed, but is a single data path the only critereon for a Cray-1? The Cray-2 is in someways a new machine and not instruction set compatable with 1s or Xs thus upsetting many existing batch-oriented sites. > >So the only major speedup between the X-mp and -2 > >cpu's is the faster clock cycle of the 2, 4.1 ns. > yes Oversimplification in some ways. The 2 has four CPUs, so is the thing 4 times faster? Architects have yet to discover Brooks's [not Eugene's] Law [I guess it derives as Amdahl's Law]. The 2 also got rid of two banks of CPU registers. It is my understanding that there was controversy inside Cray about the real effectiveness in chaining and those registers. Time will tell. > >Also, I understand that the 256K 64-bit memory is slower than the > >memory on the x-mp, but there is a fast 16K memory cache per processor. > Yes, the latency of the main memory is a real problem. We have an X-MP/1 [MOS] and an X-MP/2 [Bipolar] with exactly 2 MW memory so they have precisely 16 banks of memory. I have a memory contention test which plots like the following: Access time is the vertical dimension. /\ / \ /\ / \ /\ ___/\__/ \__/\__/ \__/\__/ \ . . . ---+----+----+------+------+----+---+---+---+-------- 4 8 12 16 20 24 28 32 36 Stride I knew I should have used dataplot. I hate ploting on an ASCII device. The beauty of this plot is that curve is identical for the 12 as in the 22. The X-MP/12 takes about 50% longer to do a memory access than the X-MP/22. I've been given various explanation, but I suspect it's strictly because of the MOS vs bipolar memory technology. Note the proportions of the peaks are precisely factors of 2 higher than the surrounding overhead. The floor of the graph is not 0, but the peaks are correctly positioned over those numbers I indicated. I can also see noise on the 22 because [we think] it's a multiprocessor and bank contention takes place because of the second CPU. So it's mostly in the memory speed. [Again grossly oversimplified.] > >So it really looks like a CDC 7600!! > I'm sure you would prefer the Cray 2. The user does not > see the 16k local memory, the compiler does. Some call it a cache. > > Question is, will the Y-MP be faster? (16 processors, 64Mwords) Y-MP? What's a Y-MP? Sorry, I cannot comment on the Y-MP. Write cray, they are on the net. I have not signed non-disclosure, but I once opened my mouth a tiny bit too wide [emphasis on tiny] once this net and a tidal wave from the MN/WI area hit me. > > Have you looked at the Cray-2 compiler... I hear its based on the old > > CFT1.10 and doesnt have character data (yet). > > I'd like to see some comparison timings between the x-mp and the 2. > > [rchrd] = Richard Friedman > > Pacific-Sierra Research, 2855 Telegraph #415 > When the xmp is benchmarked against the 2 the xmp usually wins unless > one can manage to effectively buffer vectors through the 16k cache and > make a lot of uses of the vector data. If the cache can't be effectively > used and the 3 port architecture is useable on the the loop the xmp > wins. CFT2 is based currently on the 1.09 version I believe. I am uncertain about all plans for upgrade. CFT77 [formerly NFT] is written in Cray Pascal in attempt to ease maintenance, easily add new vectorization and multi-tasking features, and so forth. CFT77 will have to have a considerable shakedown as CFT (written in CAL) is quite mature in some ways. Regarding performance: See my above test as to why. Richard, I've stopped by your office, and I welcome you to see my other performance stuff on the X-MP and 2. I just showed some of it to the LLNL people [George Michael] the other day. Bring your German parallel processor bibliography with you. From the Rock of Ages Home for Retired Hackers: --eugene miya NASA Ames Research Center {hplabs,ihnp4,dual,hao,decwrl,allegra}!ames!aurora!eugene emiya@ames-vmsb