Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!mit-eddie!killer!elg From: elg@killer.Dallas.TX.US (Eric Green) Newsgroups: comp.arch Subject: Re: Register usage Message-ID: <8104@killer.Dallas.TX.US> Date: 15 May 89 01:18:59 GMT References: <259@mindlink.UUCP> Organization: The Unix(R) Connection, Dallas, Texas Lines: 52 in article <259@mindlink.UUCP>, a464@mindlink.UUCP (Bruce Dawson) says: > One thing that needs to be kept in mind when talking about the advantages > of huge numbers of registers is that some of the advantages of registers go > away when you have a lot (when have you a lot availabel simultaneously I should > say). In the extreme case of the computer someone mentioned that had 256 > registers, a register-register operation would use up sixteen bits just to > specify the two registers involved. Contrast that with the six bits required > if you only have eight registers. Given the finite memory speeds that we have > to deal with, an extra ten bits so that you can have 256 instead of eight > registers is probably too big a price to pay and would probably slow programs > down. The AMD29000 addresses 256 registers (although there's only 192 actual registers -- 128 organized as a stack, the rest as global registers). Each AMD29000 instruction is 32 bits long. I seem to recall that it's a three-address machine instead of a two-address machine a' la 68000/80x86, so instructions are 8 bits of opcode, and 24 bits of register addresses. How does this slow it down??? In fact, the 29000 is one of the faster RISCs out there, though it didn't catch on in the Unix workstation world (for one thing, the idea of kludging in byte-fetch logic externally probably turned off potential system designers). As I mentioned in a previous posting, program-memory bandwidth is almost unlimited on these kinds of high end machines (large cache, Harvard-style seperate program and data busses, possibly interleaved program memory and burst-mode DRAM accesses by the program cache controller to take advantage of sequential accesses, etc.). The only glitches in the pipeline are a) fetching data from memory, and b) branches, so you want to reduce both as much as possible, which is why you want a lot of registers and (on the branch side) things like "smart" conditions to reduce the number of conditional branches, and branch target caches to minimize the effects when you do get a branch. Program memory bandwidth becomes almost inconsequential insofar as performance is concerned, under those conditions, and fixed-size 32-bit instructions greatly ease pipeline design. It's only when you get to low end non-Harvard machines like the 68000 that program bandwidth becomes important. Conclusions: The program memory bandwidth increase resulting from increasing the number of addressible registers is inconsequential. Other factors, such as real-time responsiveness, compiler technology, and the number of gates needed to decode the register addresses (remember the Cray axiom -- minimum # of gates in critical paths) are more important in detirmining how many registers to put in your architecture. -- | // Eric Lee Green P.O. Box 92191, Lafayette, LA 70509 | | // ..!{ames,decwrl,mit-eddie,osu-cis}!killer!elg (318)989-9849 | | // Join the Church of HAL, and worship at the altar of all computers | |\X/ with three-letter names (e.g. IBM and DEC). White lab coats optional.|