Path: utzoo!utgpu!watmath!att!rutgers!jarvis.csri.toronto.edu!db.toronto.edu!jonah From: jonah@db.toronto.edu (Jeffrey Lee) Newsgroups: comp.arch Subject: Re: ATTACK OF KILLER MICROS (Actual Message-ID: <1989Nov19.224256.29611@jarvis.csri.toronto.edu> Date: 20 Nov 89 03:42:56 GMT References: <221@dg.dg.com> <3300083@m.cs.uiuc.edu> Lines: 64 nelson@m.cs.uiuc.edu writes: >> parallelism to continue to deliver more performance. If you project the >> slope of the clock rates of supercomputers, you will see sub-nanosecond >> CYCLE times before 1995. I don't see any technologies in the wings which >> promise to allow this to continue... >Actually, I don't see this (dare I say it) EVER occuring. NEVER say "never." :-) > Ignoring > delay due to capacitance, a nanosecond is only 12 inches of wire -- > and I'm reasonably sure that the "critical path" length is at least > on the order of a foot (does anyone know?). Once capacitance delay > comes into the picture (even on-chip there is a significant amount), > even with new technologies, that 12 inches is being reduced at least > a tenfold (opinion/guess). That leaves you with an inch of wiring > for the critical path for this super technology -- that does not > seem nearly enough to build a nano-processor around. Hierarchy and locality is wonderful for dodging these sorts of problems. Put a large register set, simple ALU, and tiny instruction cache onto a single GaAs or ECL (or whatever) chip. Assume an 4-level memory where the first three levels have a .8 hit rate and a 5-fold slowdown to the next level which is 64 times larger: level access hit Ehit(ns) size 1 1ns .8 1.0 256B 2 5ns .16 1.6 16KB 3 25ns .032 2.4 1MB 4 125ns .008 3.4 64MB+ [294 W/s ==> 150 MIPS] Now, 5ns gives you just enough time to get off the chip to a close neighbour cache chip, 25ns gives you enough time to get elsewhere on the board, and 125ns is enough time to go to the bus. Each critical path gets slightly longer and slightly slower. Each level can be made from a slower and cheaper technology. With a hit rate of .8, the effective access time is 3.4 ns/word or 294 word/s. Which should put you in the 150 MIP range with RISC technology. [The ratio of 2W ==> 1 MIPS assumes that each operation (on average) uses one instruction and one data word. The SPARC seems to have a MIPS rating of about 1/2 its MHz.] Ok, so the numbers are all out of a hat. Lets try some different hats: level access hit Ehit(ns) size 1 1ns .7 1.0 256B 2 5ns .21 1.75 16KB 3 25ns .063 3.33 1MB 4 125ns .027 6.7 64MB+ [149 W/s ==> 75 MIPS] level access hit Ehit(ns) size 1 1ns .9 1.0 256B 2 5ns .09 1.35 16KB 3 25ns .009 1.58 1MB 4 125ns .001 1.7 64MB+ [588 W/s ==> 300 MIPS] I'm more inclined to believe the values of .8 or .9 for locality given the 64x expansion at each level. I've no facts though. Is a 5ns single-chip 16KB cache possible, now or in 5 years? What about a 25ns multi-chip 1MB cache? What is the normal hit rate for a 16KB cache? Comments?