Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site ames.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!gamma!epsilon!zeta!sabre!petrus!bellcore!decvax!ucbvax!ucdavis!lll-crg!dual!ames!eugene
From: eugene@ames.UUCP (Eugene Miya)
Newsgroups: net.arch
Subject: Re: Cray-2 impressions (longer)
Message-ID: <1203@ames.UUCP>
Date: Wed, 16-Oct-85 13:23:01 EDT
Article-I.D.: ames.1203
Posted: Wed Oct 16 13:23:01 1985
Date-Received: Sat, 19-Oct-85 07:39:30 EDT
References: <1189@ames.UUCP> <224@well.UUCP> <919@lll-crg.ARpA>
Distribution: net
Organization: NASA-Ames Research Center, Mtn. View, CA
Lines: 94

I posted my original message after returning from the CUG meeting at
a moment of sweeping beauty upon seeing the C-2. Truly a sight to behold.

> >  Is it true that the CRAY-2 cpu is really a CRAY-1  (not an X-MP)
> >cpu, meaning that it has only one path to memory and doesnt do
> >chaining?
> Yes, it true.

Agreed, but is a single data path the only critereon for a Cray-1?
The Cray-2 is in someways a new machine and not instruction set compatable
with 1s or Xs thus upsetting many existing batch-oriented sites.

> >So the only major speedup between the X-mp and -2
> >cpu's is the faster clock cycle of the 2,  4.1 ns.
> yes

Oversimplification in some ways.  The 2 has four CPUs, so is the thing
4 times faster?  Architects have yet to discover Brooks's [not Eugene's]
Law [I guess it derives as Amdahl's Law].  The 2 also got rid of two
banks of CPU registers.  It is my understanding that there was controversy
inside Cray about the real effectiveness in chaining and those registers.
Time will tell.  

> >Also, I understand that the 256K 64-bit memory is slower than the
> >memory on the x-mp, but there is a fast 16K memory cache per processor.
> Yes, the latency of the main memory is a real problem.

We have an X-MP/1 [MOS] and an X-MP/2 [Bipolar] with exactly 2 MW memory
so they have precisely 16 banks of memory.  I have a memory contention test
which plots like the following:

Access time is the vertical dimension.

                     /\
                    /  \
         /\        /    \        /\  
 ___/\__/  \__/\__/      \__/\__/  \ . . .
 ---+----+----+------+------+----+---+---+---+--------
    4    8   12     16     20   24  28  32  36
		Stride

I knew I should have used dataplot.  I hate ploting on an ASCII device.
The beauty of this plot is that curve is identical for the 12 as in the 22.
The X-MP/12 takes about 50% longer to do a memory access than the X-MP/22.
I've been given various explanation, but I suspect it's strictly because
of the MOS vs bipolar memory technology.  Note the proportions of the peaks
are precisely factors of 2 higher than the surrounding overhead.
The floor of the graph is not 0, but the peaks are correctly positioned
over those numbers I indicated.

I can also see noise on the 22 because [we think] it's a multiprocessor
and bank contention takes place because of the second CPU. 
So it's mostly in the memory speed.  [Again grossly oversimplified.]

> >So it really looks like a CDC 7600!!
> I'm sure you would prefer the Cray 2.  The user does not
> see the 16k local memory, the compiler does.
Some call it a cache.
> > Question is, will the Y-MP be faster?  (16 processors, 64Mwords)

Y-MP? What's a Y-MP?
Sorry, I cannot comment on the Y-MP.  Write cray, they are on the net.
I have not signed non-disclosure, but I once opened my mouth a tiny bit
too wide [emphasis on tiny] once this net and a tidal wave from the
MN/WI area hit me.
 
> > Have you looked at the Cray-2 compiler... I hear its based on the old
> > CFT1.10 and doesnt have character data (yet).
> > I'd like to see some comparison timings between the x-mp and the 2.
> >   [rchrd] = Richard Friedman
> >             Pacific-Sierra Research, 2855 Telegraph #415
> When the xmp is benchmarked against the 2 the xmp usually wins unless
> one can manage to effectively buffer vectors through the 16k cache and
> make a lot of uses of the vector data.  If the cache can't be effectively
> used and the 3 port architecture is useable on the the loop the xmp
> wins.

CFT2 is based currently on the 1.09 version I believe.  I am uncertain
about all plans for upgrade.  CFT77 [formerly NFT] is written in Cray
Pascal in attempt to ease maintenance, easily add new vectorization and
multi-tasking features, and so forth.  CFT77 will have to have a considerable
shakedown as CFT (written in CAL) is quite mature in some ways.

Regarding performance: See my above test as to why.  Richard,
I've stopped by your office, and I welcome you to see my other performance
stuff on the X-MP and 2.  I just showed some of it to the LLNL people
[George Michael] the other day.  Bring your German parallel processor
bibliography with you.

From the Rock of Ages Home for Retired Hackers:
--eugene miya
  NASA Ames Research Center
  {hplabs,ihnp4,dual,hao,decwrl,allegra}!ames!aurora!eugene
  emiya@ames-vmsb