Path: utzoo!attcan!uunet!mcvax!ukc!dcl-cs!aber-cs!pcg
From: pcg@aber-cs.UUCP (Piercarlo Grandi)
Newsgroups: comp.arch
Subject: Re: 80486 vs. 68040 code size [really: how many regs]
Message-ID: <950@aber-cs.UUCP>
Date: 15 May 89 17:56:19 GMT
Reply-To: pcg@cs.aber.ac.uk (Piercarlo Grandi)
Distribution: eunet,world
Organization: Dept of CS, UCW Aberystwyth
	(Disclaimer: my statements are purely personal)
Lines: 137

In article <19413@winchester.mips.COM> mash@mips.COM (John Mashey) writes:

    	These are not special case assumptions, although, of course the
    	example was put toghether to illustrate the point.
    	1) These days, many machines are reg-reg architectures.

Ahhh. This is a ridiculous argument. Proof-by-numbers is dangerous...
Can over 1 billion chinese be wrong in using abaci? :-) :-)

    	2) Optimizers do find this kind of stuff, especially in FORTRAN
    	codes or some of the larger hunks of C.  they don't have to be used
    	heavily, they just have to be used enough to make it worthwhile.

Again, on reg-reg machines. On Fortran, numeric code as you says.

    	3) I didn't realize the discussion was limited to reg-mem
    	architectures: I recall the following part of a posting:
		[ ... ]
Well, well. The discussion was not *limited*, it was in the context of reg-mem
machines. In extending it to RISC machines (about which, as I said, there is
even less data), you did something good. But please don't apply my statements
to a 68020-386 debate to RISC machines. On these, my hunch is that more
registers are OK.

    I did, although I'm sorry it wasn't more.  It's not something we have
    much motivation to keep current, unlike the plethora of other statistics
    that we keep around.  Should be good paper topic for somebody.

Seconded. Might get a go myself... I have a 386 at home, and will have g++
on it soon. Who can give me a reg-reg machine for comparison :-) ?

    It IS important to account for other issues, and there's no reason
    that the answer should be the same for other architectural designs.

    I.e., if you end up with memory ops, rather than load-store, you're less
    motivated to have more registers [you'd think].  On the other hand,
    even there, you sometimes win because of cycle-count (not instruction-count)
    issues, i.e., the latency cycle(s) you get on most machines from fetching
    data from memory (even cache memory).

Wise words. Very agreed. Still, the instruction count/code density issue
has its weight in system performance, of course.
    
    You'd be surprised, especially in heavily-pipelined machines.  You must be
    thinking of counting INSTRUCTIONS, not cycles: most fast (i.e., seriously
    pipelined) machines cost you a stall cycle if you want to fetch something
    from memory and use it right away, so even on a machine with mem->reg
    operations, you might choose to sometimes generate a load, followed by an
    op, because you might be able to rearrange code and get something in to
    cover the load latency. [people sometimes found this on the S/370s].
        	
    >    Why don't you like inter-expression register assignments?
    
    >Well, I like them, as long as the compiler does not do them, but the
    >programmer does, by using explicit "register" declarations....
    
    OPINION: the above statement sends me back to 15-20 years ago....
    really, if you believe this, you are not keeping up with what's
    happening in the computer business.

This is going back to the "volatile" debate, in the wrong newsgroup. I keep
up, but occasionally I disagree, especially when none of the great and good
over a period of months was able to quote figures to support their opinion.
    
    >Ahhhhhhhh. What you are saying is that you are using registers as a
    >statically allocated cache, and that this is good not because they are
    >frequently used, but because they would otherwise be frequently
    >saved/restored... Well, well, well. If you want a reg-reg architecture, you

So far we seem to agree (give and take a few registers) that the issue
requires more research, and that C on CISC is more favourable (admittedly)
than FORTRAN on RISC, etc... But here is something very interesting (sorry
for quoting so much, I'd have summarized, but I have become wary of that):

    No.  The registers are frequently used. I said the issue was subtle.
    In a leaf routine,  (on an R3000, but also, very similar on others')
    	1) One need not save/restore the return address
    	2) Most (or usually) all of the local variables get grabbed into
    		scratch registers that need not be saved.
    	3) Now, the stack frame has evaporated, and so we need not move
    	the stack pointer around, and we already usually didn't have
    	a frame pointer.

This is a very good argument, so far, for an AMD 29k style very large
register file, that becomes a statically managed first level memory, or for
a SPARC style set of (less statically managed) windows. When the register
file is very large, you are really almost dealing with a machine with fast
and slow stores, onw of which is addressed e.g. with 8 bit word addresses,
and the other with 32 bit byte addresses. The rules change dramatically. It
looks like old CYBERs (even the problem of swapping in/out the fast memory
on context switches). I happen to like the MIPS precisely because it has
NOT gone this route. I also like the Transput because it has taken this route,
but seriously (4kbyte onchip fast memory -- "registers" if you prefer).

    Since leaf routines are often about 50% of the dynamic function calls,
    this is relevant, and a similar, albeit less strong effect happens on
    others.  Having plenty of scratch registers also means you can pass a
    reasonable number of arguments in registers, avoiding doing stores in
    the caller, and loads in the callee.

    The point is, that a lot of load/store traffic around function calls
    disappears if you have enough registers and smart compilers (whether or
    not you have windows, which of course, can get rid of a few more).  Fast
    machines hate loads, because they usually cost you stall cycles.

        pay the price, you take your chances. Me, my idea of RISC is a (mostly) zero
    	>address architecture with 8/12 bit instructions, and four (to avoid extra
	>push/pop pairs in multiplexing a single one for the up to four independent
	>computations) arith stacks.

    This is a fine OPINION; the current round of new computer architectures
    has voted widely, and decisively, for load-store machines with "plenty"
    of registers addressable at any point in the program. (plenty = usually
    32, as in HP PA, MIPS, SPARC, MC88000, i860).

Again, proof-by-numbers. The current round of computer users, it could be
said, have voted decisively for segmented CISC architectures :-( :-(. Also,
it is not entirely an opinion; the only novelty I am citing is having
multiple arith stacks, but, while you say:

    In particular, although I've always admired the old B5500, it seems that
    zero-address architectures are difficult to build to really go fast...

there is the little matter of a few FACTs, called CRISP, NOVIX and
TRANSPUTER, that seem to be always forgotten (not to mention the 32532,
which has the extremely embarassing property of being a simple, well
designed, reg-mem CISC that outruns most RISCs around...) by reg-reg
and otherwise RISC designs.

Declaration of prejudice: I am all (well, 80% :->) for RISC. Of these I find
MIPS more admirable than most. The idea of simple, fast, reliable, is what I
like. It is obvious that I disagree from my armchair that the benefits of
RISC are there because of ALL the design choices of most RISCs.
-- 
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk