Path: utzoo!attcan!uunet!csinc!rpeglar
From: rpeglar@csinc.UUCP (Rob Peglar x615)
Newsgroups: comp.arch
Subject: Re: cache speed
Summary: It ain't what you think.
Message-ID: <107@csinc.UUCP>
Date: 24 Aug 89 15:10:15 GMT
References: <1473@unocss.UUCP> <3941@phri.UUCP> <1736@crdgw1.crd.ge.com> <MCCALPIN.89Aug22071730@masig3.ocean.fsu.edu>
Organization: Control Systems, Inc., St. Paul MN
Lines: 85

In article <MCCALPIN.89Aug22071730@masig3.ocean.fsu.edu>, mccalpin@masig3.ocean.fsu.edu (John D. McCalpin) writes:
> In article <MCCALPIN.89Aug18054110@masig3.ocean.fsu.edu> I wrote that
> the memory subsytem on the ETA-10G had an access time near 30 ns.
> 
> In article <1878@brwa.inmos.co.uk> davidb@braa.inmos.co.uk (David
> Boreham) replied:
> >Are you sure that the actual *memory* subsystem was cycled in 30ns ?
> 
> I'm not sure how to answer this, but I will make the following notes:
> (1) There is no cache.

Not in the sense of a general purpose I/D cache.  There is, however, a
"buffer" (really a cache) for instructions.  The instruction fetch is
rather coarse - thus, many loops can fit w/o accessing main mem.

> (2) The memory is definitely interleaved.  I don't remember how many
>     banks, or what the bank busy time is.

I do, but I can't tell you, lest the hordes of CDC lawyers descend upon
me.  The banking/bank busy may be considered proprietary info, and people
like me (former employees) can be sued.  In my opinion, however, it is
similar to its predecessor machine(s).

> (3) A scalar load instruction requires either 6 or 8 cycles -- I don't
>     remember which.
> (4) The instruction decode is probably overlapped with the previous
>     instruction.

See above.  In my opinion, the scalar load (0x7e {64-bit}, 0x5e {32-bit})
instructions are abysmally slow.  More cycles than you would guess.  This
was, and is, a major factor in the ETA-10's scalar/vector imbalance, an
attribute which contributed to its negative taste in a lot of customers'
and potential customers' mouths.
Just my opinion.

> 
> So I conclude that the memory access time is 40-50 ns. Not as good as
> I thought originally, but still fairly impressive.  It will be even
> more impressive if they can deliver the 128 MB array at those speeds.
> I hear that the 128 MB array works fine at 10.5 ns, and they are trying
> to tweak it to run at 7ns to deliver to FSU.

Only 4 months later than promised, and counting   :-)

> 
> A similar access time number can be obtained from the vector unit
> timings.  The vector startup overhead is 16 cycles in the best case.
> This consists of the time to load the first element of each array,
> plus the pipeline length (5 cycles ???), plus the time required to put
> the first result back to memory, plus any other work that can't be
> overlapped.  This suggests an access time pretty close to 5-6 cycles....

You're in the ballpark.

> 
> >As the previous poster pointed out, you can get 20--25ns SRAMs, but
> >to build a 32Mbyte system which randomly cycles in 30ns, using CMOS
> >sounds rather unlikely. Perhaps the memory is interleaved ?
> 
> The previous timings are based on the absence of memory bank conflicts.
> The memory is CMOS SRAM.
> 
> >Also, surely the instruction fetch would be pipelined to overlap
> >with the previous instruction ?
> 
> Most likely, but I don't have a reference on what exactly is overlapped
> on that machine.  It seems that the details of the Cyber 205 are much
> more widely known than the details of the ETA-10.

You are quite correct, John.  This was, in my opinion, a deliberate strategy
on CDC's part.  


Have fun.


Rob

Rob Peglar	...!uunet!csinc!rpeglar
Manager, Software R&D, Workstation Group
Control Systems, Inc., St. Paul, MN

The opinions expressed herein are solely those of the author.  
Such opinions do not reflect in any way upon my employer.
So there.