Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: Notesfiles $Revision: 1.7.0.10 $; site ccvaxa Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!mhuxt!houxm!ihnp4!inuxc!pur-ee!uiucdcs!ccvaxa!aglew From: aglew@ccvaxa.UUCP Newsgroups: net.arch Subject: Re: RISC cache vs CISC u-code Message-ID: <5100024@ccvaxa> Date: Sat, 8-Mar-86 22:03:00 EST Article-I.D.: ccvaxa.5100024 Posted: Sat Mar 8 22:03:00 1986 Date-Received: Wed, 12-Mar-86 02:10:06 EST References: <136@pyramid.UUCP> Lines: 38 Nf-ID: #R:pyramid.UUCP:136:ccvaxa:5100024:000:2432 Nf-From: ccvaxa.UUCP!aglew Mar 8 21:03:00 1986 Responding to billw at navajo.ARPA, who was responding to... I agree with your basic point, but there's another aspect to RISCs: there is a big difference at the moment between hardware, where it is easy to do things in parallel, and software, where it isn't. Microcode is just software used to implement sequential operations. One of the things we can do to increase speed is to make sequential operations parallel, which usually comes down to implementing serial operations combinatorically. Whenever you have a serial operation that cannot be made parallel, there are usually enough special cases that can be detected at compile time to make a standard library function suboptimal - and this is just as true for microcode as it is for a matrix mathematical library. (Just how many different forms of matrix multiplication are there: block, upper triangular, band, sparse...). Somebody else was talking about caches. Here're some random musings: registers are just caches explicitly controlled by software. Register windows are specially structured stack caches. We should have a special cache for each frequently used data type, with a fetch/replacement strategy optimized for that data type. Instructions and data are different, so they need different caches. We have both transparent and explicitly controlled (registers) data caches; instruction caches are usually transparent, not explicitly controlled. Could explicitly controlled instruction caches be useful? (Ask MU5). The likely bit on branches is a start. Overlays are an explicitly controlled instruction cache mechanism. An instruction cache should have automatic linear prefetch, and should probably try to prefetch the heads of procedures. It should try to keep return points in the cache. Heads of loops should be left in the cache once fetched; backward branches can be used as a clue to finding heads of loops, but are no good if the loop is long - which is exactly when you want to keep the loop head in the cache. What we need is a special mark for heads of loops - perhaps an explicit instruction, perhaps just a bit in an instruction, perhaps branch tables as in MU5. Perhaps this could be used to minimize loop overhead for while test at the top rather than until test at the bottom loops: the branch back to the test at the top could automatically fire off the head of loop instruction, so it might be possible to execute them both in one cycle.