Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!uflorida!stat!stat.fsu.edu!mccalpin From: mccalpin@masig3.ocean.fsu.edu (John D. McCalpin) Newsgroups: comp.arch Subject: Re: cache speed Message-ID: Date: 22 Aug 89 11:17:29 GMT References: <1473@unocss.UUCP> <3941@phri.UUCP> <1736@crdgw1.crd.ge.com> <1878@brwa.inmos.co.uk> Sender: news@stat.fsu.edu Organization: Supercomputer Computations Research Institute Lines: 45 In-reply-to: davidb@braa.inmos.co.uk's message of 21 Aug 89 14:49:54 GMT In article I wrote that the memory subsytem on the ETA-10G had an access time near 30 ns. In article <1878@brwa.inmos.co.uk> davidb@braa.inmos.co.uk (David Boreham) replied: >Are you sure that the actual *memory* subsystem was cycled in 30ns ? I'm not sure how to answer this, but I will make the following notes: (1) There is no cache. (2) The memory is definitely interleaved. I don't remember how many banks, or what the bank busy time is. (3) A scalar load instruction requires either 6 or 8 cycles -- I don't remember which. (4) The instruction decode is probably overlapped with the previous instruction. So I conclude that the memory access time is 40-50 ns. Not as good as I thought originally, but still fairly impressive. It will be even more impressive if they can deliver the 128 MB array at those speeds. I hear that the 128 MB array works fine at 10.5 ns, and they are trying to tweak it to run at 7ns to deliver to FSU. A similar access time number can be obtained from the vector unit timings. The vector startup overhead is 16 cycles in the best case. This consists of the time to load the first element of each array, plus the pipeline length (5 cycles ???), plus the time required to put the first result back to memory, plus any other work that can't be overlapped. This suggests an access time pretty close to 5-6 cycles.... >As the previous poster pointed out, you can get 20--25ns SRAMs, but >to build a 32Mbyte system which randomly cycles in 30ns, using CMOS >sounds rather unlikely. Perhaps the memory is interleaved ? The previous timings are based on the absence of memory bank conflicts. The memory is CMOS SRAM. >Also, surely the instruction fetch would be pipelined to overlap >with the previous instruction ? Most likely, but I don't have a reference on what exactly is overlapped on that machine. It seems that the details of the Cyber 205 are much more widely known than the details of the ETA-10. -- John D. McCalpin - mccalpin@masig1.ocean.fsu.edu - mccalpin@nu.cs.fsu.edu mccalpin@delocn.udel.edu