Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!uflorida!stat!stat.fsu.edu!mccalpin
From: mccalpin@masig3.ocean.fsu.edu (John D. McCalpin)
Newsgroups: comp.arch
Subject: Re: cache speed
Message-ID: <MCCALPIN.89Aug22071730@masig3.ocean.fsu.edu>
Date: 22 Aug 89 11:17:29 GMT
References: <1473@unocss.UUCP> <3941@phri.UUCP> <1736@crdgw1.crd.ge.com>
	<MCCALPIN.89Aug18054110@masig3.ocean.fsu.edu> <1878@brwa.inmos.co.uk>
Sender: news@stat.fsu.edu
Organization: Supercomputer Computations Research Institute
Lines: 45
In-reply-to: davidb@braa.inmos.co.uk's message of 21 Aug 89 14:49:54 GMT

In article <MCCALPIN.89Aug18054110@masig3.ocean.fsu.edu> I wrote that
the memory subsytem on the ETA-10G had an access time near 30 ns.

In article <1878@brwa.inmos.co.uk> davidb@braa.inmos.co.uk (David
Boreham) replied:
>Are you sure that the actual *memory* subsystem was cycled in 30ns ?

I'm not sure how to answer this, but I will make the following notes:
(1) There is no cache.
(2) The memory is definitely interleaved.  I don't remember how many
    banks, or what the bank busy time is.
(3) A scalar load instruction requires either 6 or 8 cycles -- I don't
    remember which.
(4) The instruction decode is probably overlapped with the previous
    instruction.

So I conclude that the memory access time is 40-50 ns. Not as good as
I thought originally, but still fairly impressive.  It will be even
more impressive if they can deliver the 128 MB array at those speeds.
I hear that the 128 MB array works fine at 10.5 ns, and they are trying
to tweak it to run at 7ns to deliver to FSU.

A similar access time number can be obtained from the vector unit
timings.  The vector startup overhead is 16 cycles in the best case.
This consists of the time to load the first element of each array,
plus the pipeline length (5 cycles ???), plus the time required to put
the first result back to memory, plus any other work that can't be
overlapped.  This suggests an access time pretty close to 5-6 cycles....

>As the previous poster pointed out, you can get 20--25ns SRAMs, but
>to build a 32Mbyte system which randomly cycles in 30ns, using CMOS
>sounds rather unlikely. Perhaps the memory is interleaved ?

The previous timings are based on the absence of memory bank conflicts.
The memory is CMOS SRAM.

>Also, surely the instruction fetch would be pipelined to overlap
>with the previous instruction ?

Most likely, but I don't have a reference on what exactly is overlapped
on that machine.  It seems that the details of the Cyber 205 are much
more widely known than the details of the ETA-10.
--
John D. McCalpin - mccalpin@masig1.ocean.fsu.edu - mccalpin@nu.cs.fsu.edu
		   mccalpin@delocn.udel.edu