Path: utzoo!attcan!uunet!csinc!rpeglar From: rpeglar@csinc.UUCP (Rob Peglar x615) Newsgroups: comp.arch Subject: Re: cache speed Summary: It ain't what you think. Message-ID: <107@csinc.UUCP> Date: 24 Aug 89 15:10:15 GMT References: <1473@unocss.UUCP> <3941@phri.UUCP> <1736@crdgw1.crd.ge.com> Organization: Control Systems, Inc., St. Paul MN Lines: 85 In article , mccalpin@masig3.ocean.fsu.edu (John D. McCalpin) writes: > In article I wrote that > the memory subsytem on the ETA-10G had an access time near 30 ns. > > In article <1878@brwa.inmos.co.uk> davidb@braa.inmos.co.uk (David > Boreham) replied: > >Are you sure that the actual *memory* subsystem was cycled in 30ns ? > > I'm not sure how to answer this, but I will make the following notes: > (1) There is no cache. Not in the sense of a general purpose I/D cache. There is, however, a "buffer" (really a cache) for instructions. The instruction fetch is rather coarse - thus, many loops can fit w/o accessing main mem. > (2) The memory is definitely interleaved. I don't remember how many > banks, or what the bank busy time is. I do, but I can't tell you, lest the hordes of CDC lawyers descend upon me. The banking/bank busy may be considered proprietary info, and people like me (former employees) can be sued. In my opinion, however, it is similar to its predecessor machine(s). > (3) A scalar load instruction requires either 6 or 8 cycles -- I don't > remember which. > (4) The instruction decode is probably overlapped with the previous > instruction. See above. In my opinion, the scalar load (0x7e {64-bit}, 0x5e {32-bit}) instructions are abysmally slow. More cycles than you would guess. This was, and is, a major factor in the ETA-10's scalar/vector imbalance, an attribute which contributed to its negative taste in a lot of customers' and potential customers' mouths. Just my opinion. > > So I conclude that the memory access time is 40-50 ns. Not as good as > I thought originally, but still fairly impressive. It will be even > more impressive if they can deliver the 128 MB array at those speeds. > I hear that the 128 MB array works fine at 10.5 ns, and they are trying > to tweak it to run at 7ns to deliver to FSU. Only 4 months later than promised, and counting :-) > > A similar access time number can be obtained from the vector unit > timings. The vector startup overhead is 16 cycles in the best case. > This consists of the time to load the first element of each array, > plus the pipeline length (5 cycles ???), plus the time required to put > the first result back to memory, plus any other work that can't be > overlapped. This suggests an access time pretty close to 5-6 cycles.... You're in the ballpark. > > >As the previous poster pointed out, you can get 20--25ns SRAMs, but > >to build a 32Mbyte system which randomly cycles in 30ns, using CMOS > >sounds rather unlikely. Perhaps the memory is interleaved ? > > The previous timings are based on the absence of memory bank conflicts. > The memory is CMOS SRAM. > > >Also, surely the instruction fetch would be pipelined to overlap > >with the previous instruction ? > > Most likely, but I don't have a reference on what exactly is overlapped > on that machine. It seems that the details of the Cyber 205 are much > more widely known than the details of the ETA-10. You are quite correct, John. This was, in my opinion, a deliberate strategy on CDC's part. Have fun. Rob Rob Peglar ...!uunet!csinc!rpeglar Manager, Software R&D, Workstation Group Control Systems, Inc., St. Paul, MN The opinions expressed herein are solely those of the author. Such opinions do not reflect in any way upon my employer. So there.