Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ukma!gatech!prism!loligo!mccalpin From: mccalpin@loligo.cc.fsu.edu (John McCalpin) Newsgroups: comp.arch Subject: Re: Do you have bandwidth? Summary: the other half of the story Keywords: memory bandwidth latency Message-ID: <593@loligo.cc.fsu.edu> Date: 17 Apr 89 16:53:22 GMT References: <7766@thorin.cs.unc.edu> <592@loligo.cc.fsu.edu> Reply-To: mccalpin@loligo.cc.fsu.edu (John McCalpin) Organization: Supercomputer Computations Research Institute Lines: 42 In article <592@loligo.cc.fsu.edu> I wrote: >One place where the distinction between latency and bandwidth shows up >very clearly is in the CDC/ETA line of supercomputers. These machines >(the Cyber 205 and ETA-10) use a memory-to-memory vector architecture. I then went on to discuss the bandwidth, but not the latency. I guess that I didn't make a very clear distinction. :-) Recap: we have a machine with 4 7 ns CPU's, each with 32 MB of SRAM and a 6850 MB/s memory channel. The CPUs share another 1 GB of DRAM, with 4 1140 MB/s channels currently installed (one to each CPU). The latency is important because the overhead of setting up a memory-to-memory vector operation includes the memory latency plus the pipe length, plus other stuff relating to decoding the instruction, etc. The latency of the SRAM on the ETA-10 is about 6-8 cycles, and the pipe length is 5. So even if the instruction took zero time to decode (don't we all wish!), there should be an overhead of 11-13 cycles on each vector operation. In fact, the hardware overhead on the ETA-10 is down to about 16-23 cycles, depending on how the banks are aligned for the input and output vectors. This allows very good performance on fairly short vectors. The latency tends to be more of a bother in the random gather/scatter instructions. The ETA-10 (like the Cray machines) uses banked memory, set up so that sequential accesses come at full (6850 MB/s) speed. Random accesses can be MUCH slower. Repeated accesses to the same bank (typically resulting from a stride through an array which is a multiple of 8 or 16) result in a full latency delay on each access. Most ETA-10 users would really like to see the latency go down so that bank conflicts would be less trouble on random gathers/scatters. Disclaimer: I don't work for CDC/ETA. In fact, I don't work much at all.... -- ---------------------- John D. McCalpin ------------------------ Dept of Oceanography & Supercomputer Computations Research Institute mccalpin@masig1.ocean.fsu.edu mccalpin@nu.cs.fsu.edu --------------------------------------------------------------------