Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!sun-barr!texsun!playroom!pitstop!acockcroft From: acockcroft@pitstop.West.Sun.COM (Adrian Cockcroft) Newsgroups: comp.arch Subject: Re: Fast DRAMs and caches (was Re: cache speed) Message-ID: <839@pitstop.West.Sun.COM> Date: 31 Aug 89 17:24:58 GMT References: <26964@amdcad.AMD.COM> <1989Aug25.225511.828@mentor.com> Organization: Sun Microsystems, Mt. View, CA Lines: 42 > I take it you are proposing using the VRAM serial port for instruction fetches. > That would work out fine for linear code sequences, but would cost you a full > RAM cycle to do a branch. Delayed branch architectures could mitigate that > cost somewhat, but it would still cost you. It's cheaper than a cache, but not > as fast as a good one. Has anyone tried using VRAMs like that in a real > system? > -- > Michael Butts, Research Engineer KC7IT 503-626-1302 > Mentor Graphics Corp., 8500 SW Creekside Place, Beaverton, OR 97005 The Sun 4/110 is pretty close. It has no extra cache memory, just the main DRAM banks which are built out of NMB2800 256K Static Column DRAMs or 1 M Fast page mode DRAMs with access times of about 80ns. The effect is to have a cache with one line per bank of RAM where each line is about 1K long. I think an 8Mb 4/110 using 256K RAMs had a total of 2K effective cache and a 32 Mb 4/110 had a 4K effective cache. This is cheap but it is not all that effective and Sun hasn't used that design again. One side effect is that benchmark times can vary by as much as +/-30% in pathalogical cases and you need to average the results of lots of runs. The problems occur when the current data page and the current intruction page are both in the same banck of memory and are sharing one cache line so that every load and store causes a cache miss. Put a real cache in.... Note that for vector processors VRAMs can be used nicely to build your vector pipe. This was done on the FPS T-series and (I think) on the Sun/Trancept TAAC-1. The most effective DRAM controller I have come accross is the Intel 82786 graphics chip, you can get about 40 Mbyte/s through a 16 bit wide interface by using interleaved fast page mode DRAM's and you need hardly any glue logic. It uses this bandwidth to implement "windows in hardware" which won't work with VRAMs because you have to fetch data from all over the place. Adrian -- Adrian Cockcroft Sun Cambridge UK TSE sun!sunuk!acockcroft Disclaimer: These are my own opinions