Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: Notesfiles $Revision: 1.7.0.10 $; site ccvaxa
Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!mhuxt!houxm!ihnp4!inuxc!pur-ee!uiucdcs!ccvaxa!aglew
From: aglew@ccvaxa.UUCP
Newsgroups: net.arch
Subject: Re: RISC cache vs CISC u-code
Message-ID: <5100024@ccvaxa>
Date: Sat, 8-Mar-86 22:03:00 EST
Article-I.D.: ccvaxa.5100024
Posted: Sat Mar  8 22:03:00 1986
Date-Received: Wed, 12-Mar-86 02:10:06 EST
References: <136@pyramid.UUCP>
Lines: 38
Nf-ID: #R:pyramid.UUCP:136:ccvaxa:5100024:000:2432
Nf-From: ccvaxa.UUCP!aglew    Mar  8 21:03:00 1986


Responding to billw at navajo.ARPA, who was responding to...

I agree with your basic point, but there's another aspect to RISCs:
there is a big difference at the moment between hardware, where it is 
easy to do things in parallel, and software, where it isn't. Microcode
is just software used to implement sequential operations. One of the 
things we can do to increase speed is to make sequential operations 
parallel, which usually comes down to implementing serial operations
combinatorically. Whenever you have a serial operation that cannot be
made parallel, there are usually enough special cases that can be detected
at compile time to make a standard library function suboptimal - and this
is just as true for microcode as it is for a matrix mathematical library.
(Just how many different forms of matrix multiplication are there:
block, upper triangular, band, sparse...).

Somebody else was talking about caches. Here're some random musings:
registers are just caches explicitly controlled by software. Register windows
are specially structured stack caches. We should have a special cache for
each frequently used data type, with a fetch/replacement strategy optimized 
for that data type.
	Instructions and data are different, so they need different caches.
We have both transparent and explicitly controlled (registers) data caches;
instruction caches are usually transparent, not explicitly controlled. Could
explicitly controlled instruction caches be useful? (Ask MU5). The likely bit 
on branches is a start. Overlays are an explicitly controlled instruction 
cache mechanism. An instruction cache should have automatic linear prefetch,
and should probably try to prefetch the heads of procedures. It should try to
keep return points in the cache. Heads of loops should be left in the cache
once fetched; backward branches can be used as a clue to finding heads of
loops, but are no good if the loop is long - which is exactly when you want to
keep the loop head in the cache. What we need is a special mark for heads of
loops - perhaps an explicit instruction, perhaps just a bit in an instruction,
perhaps branch tables as in MU5. Perhaps this could be used to minimize loop
overhead for while test at the top rather than until test at the bottom loops:
the branch back to the test at the top could automatically fire off the 
head of loop instruction, so it might be possible to execute them both in one
cycle.