Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site ima.UUCP Path: utzoo!watmath!clyde!burl!ulysses!bellcore!decvax!ima!johnl From: johnl@ima.UUCP (John R. Levine) Newsgroups: net.arch Subject: Re: RISC cache vs CISC u-code Message-ID: <148@ima.UUCP> Date: Mon, 10-Mar-86 12:26:24 EST Article-I.D.: ima.148 Posted: Mon Mar 10 12:26:24 1986 Date-Received: Wed, 12-Mar-86 21:52:54 EST References: <136@pyramid.UUCP> <570@imag.UUCP> <4521@think.ARPA> <765@harvard.UUCP> Reply-To: johnl@ima.UUCP (John R. Levine) Organization: Javelin Software Corp. Lines: 33 Summary: who needs an instruction cache? In article <765@harvard.UUCP> reiter@harvard.UUCP (Ehud reiter) writes: >The numerous articles on RISC machines have all assumed that such machines have >caches. However, the only commercial RISC machine that I'm familiar with, the >IBM PC/RT, does NOT have a cache, and seems to suffer a factor of 3 performance >degradation because of this (2 MIPS instead of 6 MIPS). To quote from IBM RT >PERSONAL COMPUTER TECHNOLOGY (probably available from your friendly >neighborhood IBM salesman), pg 48 - "The 801 minicomputer ... had exceptionally >high performance. However, much of its performance depended on its two caches, >which can deliver an instruction word and a data word on each CPU cycle. SINCE >SUCH CACHES WERE PROHIBITIVELY COSTLY FOR SMALL SYSTEMS ..." (emphasis mine). Hmmn. If you continued reading a few pages past that quote, you'd find that the ROMP has other architectural aspects that mitigate the effects of having no cache. For one thing, the ROMP does have four words of instruction prefetch buffer which gives the chip some latitude in when it fetches its instructions. It also has an extremely fast bus, the ROMP Storage Channel, which can handle a transfer every cycle and allows several transfers to be outstanding, since each request has a five-bit tag which the slave device passes back for matching up by the master. Memory can be interleaved many ways to allow lots of cycles to be going at once. The technology book on p. 58 says that the chip only uses 60% - 70% of the bus bandwidth, which suggests that adding a cache wouldn't help as much as you'd think. Software can also be of some help here -- for example there are instructions for unpacking the bytes in a register and I gather that the PL.8 compiler tries to fetch fullwords and unpack them rather than fetching several adjacent bytes separately. -- John Levine, Javelin Software, Cambridge MA +1 617 494 1400 { decvax | harvard | think | ihnp4 | cbosgd }!ima!johnl, Levine@YALE.ARPA The opinions above are solely those of a 12 year old hacker who has broken into my account, and not those of my employer or any other organization.