Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!mit-eddie!killer!elg
From: elg@killer.Dallas.TX.US (Eric Green)
Newsgroups: comp.arch
Subject: Re: Register usage
Message-ID: <8104@killer.Dallas.TX.US>
Date: 15 May 89 01:18:59 GMT
References: <259@mindlink.UUCP>
Organization: The Unix(R) Connection, Dallas, Texas
Lines: 52

in article <259@mindlink.UUCP>, a464@mindlink.UUCP (Bruce Dawson) says:
>      One thing that needs to be kept in mind when talking about the advantages
> of huge numbers of registers is that some of the advantages of registers go
> away when you have a lot (when have you a lot availabel simultaneously I should
> say).  In the extreme case of the computer someone mentioned that had 256
> registers, a register-register operation would use up sixteen bits just to
> specify the two registers involved.  Contrast that with the six bits required
> if you only have eight registers.  Given the finite memory speeds that we have
> to deal with, an extra ten bits so that you can have 256 instead of eight
> registers is probably too big a price to pay and would probably slow programs
> down.

The AMD29000 addresses 256 registers (although there's only 192 actual
registers -- 128 organized as a stack, the rest as global registers).
Each AMD29000 instruction is 32 bits long. I seem to recall that it's
a three-address machine instead of a two-address machine a' la
68000/80x86, so instructions are 8 bits of opcode, and 24 bits of
register addresses. How does this slow it down??? In fact, the 29000
is one of the faster RISCs out there, though it didn't catch on in the
Unix workstation world (for one thing, the idea of kludging in
byte-fetch logic externally probably turned off potential system
designers). 

As I mentioned in a previous posting, program-memory bandwidth is
almost unlimited on these kinds of high end machines (large cache,
Harvard-style seperate program and data busses, possibly interleaved
program memory and burst-mode DRAM accesses by the program cache
controller to take advantage of sequential accesses, etc.). The only
glitches in the pipeline are a) fetching data from memory, and b)
branches, so you want to reduce both as much as possible, which is why
you want a lot of registers and (on the branch side) things like
"smart" conditions to reduce the number of conditional branches, and
branch target caches to minimize the effects when you do get a branch.
Program memory bandwidth becomes almost inconsequential insofar as
performance is concerned, under those conditions, and fixed-size
32-bit instructions greatly ease pipeline design.  It's only when you
get to low end non-Harvard machines like the 68000 that program
bandwidth becomes important.

Conclusions: The program memory bandwidth increase resulting from
increasing the number of addressible registers is inconsequential.
Other factors, such as real-time responsiveness, compiler technology,
and the number of gates needed to decode the register addresses
(remember the Cray axiom -- minimum # of gates in critical paths) are
more important in detirmining how many registers to put in your
architecture.

--
|    // Eric Lee Green              P.O. Box 92191, Lafayette, LA 70509     |
|   //  ..!{ames,decwrl,mit-eddie,osu-cis}!killer!elg     (318)989-9849     |
|  //    Join the Church of HAL, and worship at the altar of all computers  |
|\X/   with three-letter names (e.g. IBM and DEC). White lab coats optional.|