Path: utzoo!utgpu!watmath!att!rutgers!jarvis.csri.toronto.edu!db.toronto.edu!jonah
From: jonah@db.toronto.edu (Jeffrey Lee)
Newsgroups: comp.arch
Subject: Re: ATTACK OF KILLER MICROS (Actual
Message-ID: <1989Nov19.224256.29611@jarvis.csri.toronto.edu>
Date: 20 Nov 89 03:42:56 GMT
References: <221@dg.dg.com> <3300083@m.cs.uiuc.edu>
Lines: 64

nelson@m.cs.uiuc.edu writes:

>> parallelism to continue to deliver more performance.  If you project the
>> slope of the clock rates of supercomputers, you will see sub-nanosecond
>> CYCLE times before 1995.  I don't see any technologies in the wings which
>> promise to allow this to continue...
>Actually, I don't see this (dare I say it) EVER occuring.

NEVER say "never." :-)

>                                                           Ignoring
>  delay due to capacitance, a nanosecond is only 12 inches of wire --
>  and I'm reasonably sure that the "critical path" length is at least
>  on the order of a foot (does anyone know?).  Once capacitance delay
>  comes into the picture (even on-chip there is a significant amount),
>  even with new technologies, that 12 inches is being reduced at least
>  a tenfold (opinion/guess).  That leaves you with an inch of wiring
>  for the critical path for this super technology -- that does not
>  seem nearly enough to build a nano-processor around. 

Hierarchy and locality is wonderful for dodging these sorts of
problems.  Put a large register set, simple ALU, and tiny instruction
cache onto a single GaAs or ECL (or whatever) chip.  Assume an 4-level
memory where the first three levels have a .8 hit rate and a 5-fold
slowdown to the next level which is 64 times larger:

	level	access	hit   Ehit(ns)	size
	1	1ns	.8	1.0	256B
	2	5ns	.16	1.6	16KB
	3	25ns	.032	2.4	1MB
	4	125ns	.008	3.4	64MB+	[294 W/s ==> 150 MIPS]

Now, 5ns gives you just enough time to get off the chip to a close
neighbour cache chip, 25ns gives you enough time to get elsewhere on
the board, and 125ns is enough time to go to the bus.  Each critical
path gets slightly longer and slightly slower.  Each level can be made
from a slower and cheaper technology.  With a hit rate of .8, the
effective access time is 3.4 ns/word or 294 word/s.  Which should put
you in the 150 MIP range with RISC technology.  [The ratio of 2W ==> 1
MIPS assumes that each operation (on average) uses one instruction and
one data word.  The SPARC seems to have a MIPS rating of about 1/2 its
MHz.]

Ok, so the numbers are all out of a hat.  Lets try some different hats:

	level	access	hit   Ehit(ns)	size
	1	1ns	.7	1.0	256B
	2	5ns	.21	1.75	16KB
	3	25ns	.063	3.33	1MB
	4	125ns	.027	6.7	64MB+	[149 W/s ==> 75 MIPS]

	level	access	hit   Ehit(ns)	size
	1	1ns	.9	1.0	256B
	2	5ns	.09	1.35	16KB
	3	25ns	.009	1.58	1MB
	4	125ns	.001	1.7	64MB+	[588 W/s ==> 300 MIPS]

I'm more inclined to believe the values of .8 or .9 for locality given
the 64x expansion at each level.  I've no facts though.

Is a 5ns single-chip 16KB cache possible, now or in 5 years?
What about a 25ns multi-chip 1MB cache?
What is the normal hit rate for a 16KB cache?
Comments?