Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!rutgers!apple!versatc!mips!mash
From: mash@mips.COM (John Mashey)
Newsgroups: comp.arch
Subject: Re: 80486 vs. 68040 code size [really: how many regs]
Message-ID: <19413@winchester.mips.COM>
Date: 11 May 89 04:48:22 GMT
References: <927@aber-cs.UUCP>
Reply-To: mash@mips.COM (John Mashey)
Distribution: eunet,world
Organization: MIPS Computer Systems, Sunnyvale, CA
Lines: 134

In article <927@aber-cs.UUCP> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
>In article <19063@winchester.mips.COM> mash@mips.COM (John Mashey) writes:
>    
>    The simplicity of source statements has little to do with the number of
>    registers desirable, unless the only compiler your have generates code
>    on a statement-by-statement basis only, i.e., no optimization.
>					    ^^^^^^^^^^^^^^^^^^^^^
>Optimization is not just (and maybe even not most importantly) inter
>statement...
>
>    For example, consider a typical RISC (i.e., load/store), and the C stmts:
>    	a = b + 5;
>    	c = b + 7;
>	[ .... ]
>
>Your example works, but under special case assumptions: that you are working on
>a reg-reg architecture, whereas we were discussing reg-mem ones; that putting
>all three a,b,c in registers is worthwhile because they are going to be used
>heavily in other parts of the program.
	These are not special case assumptions, although, of course the
	example was put toghether to illustrate the point.
	1) These days, many machines are reg-reg architectures.
	2) Optimizers do find this kind of stuff, especially in FORTRAN
	codes or some of the larger hunks of C.  they don't have to be used
	heavily, they just have to be used enough to make it worthwhile.
	3) I didn't realize the discussion was limited to reg-mem
	architectures: I recall the following part of a posting:
-------
>Even some of the RISC guys, and some that do a great deal of inter-expression
>register assignments (which I don't like at all -- long life "register"!),
>like MIPS, don't see a great advantage in extravagant register file sizes.
>
>	If John Mashey could give us some of the numbers that MIPS have
>	surely got on where they estimate the knees of the curves for
>	registers for intra and inter statement optimization to be, we would
>	be greatly enlightened (e.g. as to why their register file is so
>	different from that of SPARC and 29K); maybe he has already, and I
>	have missed them.
--------
I did, although I'm sorry it wasn't more.  It's not something we have
much motivation to keep current, unlike the plethora of other statistics
that we keep around.  Should be good paper topic for somebody.
It IS important to account for other issues, and there's no reason
that the answer should be the same for other architectural designs.
I.e., if you end up with memory ops, rather than load-store, you're less
motivated to have more registers [you'd think].  On the other hand,
even there, you sometimes win because of cycle-count (not instruction-count)
issues, i.e., the latency cycle(s) you get on most machines from fetching
data from memory (even cache memory).

>The reg-reg assumption actually may point at one of their weaknesses, that
>since the cost of computing with parts of your operands in memory has a high
>fixed cost, you tend to want to store everything in regs, even if they are
>used little. In a reg-mem architecture little use variables in memory do not
>carry costs as high when you use them.
You'd be surprised, especially in heavily-pipelined machines.  You must be
thinking of counting INSTRUCTIONS, not cycles: most fast (i.e., seriously
pipelined) machines cost you a stall cycle if you want to fetch something
from memory and use it right away, so even on a machine with mem->reg
operations, you might choose to sometimes generate a load, followed by an
op, because you might be able to rearrange code and get something in to
cover the load latency. [people sometimes found this on the S/370s].
    	
>    Why don't you like inter-expression register assignments?

>Well, I like them, as long as the compiler does not do them, but the
>programmer does, by using explicit "register" declarations....

OPINION: the above statement sends me back to 15-20 years ago....
really, if you believe this, you are not keeping up with what's
happening in the computer business.

>Disclaimer: I have only worked extensively on reg-mem machines so far. For
>such machines I beg to differ.
As noted, there is no "right" number for every architecture and
language; you'll get away with fewer registers in C than some of the
others.
>
>    Note that they are useful in two distinct ways:
>
>    	1) To evaluate expressions, including global optimizations.
>
>    	2) To have enough scratch registers that many functions need
>    	0 (leaf) or 1 register, unless the optimizer decides it's really
>    	worth having a bunch of registers.  Note that if you only have
>    	X registers available, and you generally need approx. X to do
>    	reasonable expression evaluation, you must save/restore a healthy
>    	percentage of X registers across function calls, or go completely
>    	to callee-save.  Most people with this kind of architecture
>    	have found it best to split the registers between callee-save
>    	and caller-save.  In our case, we save about 1.6 regs/average
>    	function call, across wide range of benchmarks, and that is due to
>    	having ENOUGH registers to allow both safe and scratch registers,
>    	and still have enough scratch registers to do plenty of evaluation.
>
>    Note that 2) is a subtle issue, easily overlooked; but is very important,
>    especially in the "register-window vs non-register-window" wars.
>
>Ahhhhhhhh. What you are saying is that you are using registers as a
>statically allocated cache, and that this is good not because they are
>frequently used, but because they would otherwise be frequently
>saved/restored... Well, well, well. If you want a reg-reg architecture, you
No.  The registers are frequently used. I said the issue was subtle.
In a leaf routine,  (on an R3000, but also, very similar on others')
	1) One need not save/restore the return address
	2) Most (or usually) all of the local variables get grabbed into
		scratch registers that need not be saved.
	3) Now, the stack frame has evaporated, and so we need not move
	the stack pointer around, and we already usually didn't have
	a frame pointer.
Since leaf routines are often about 50% of the dynamic function calls, 
this is relevant, and a similar, albeit less strong effect happens on others.
Having plenty of scratch registers also means you can pass a reasonable
number of arguments in registers, avoiding doing stores in the caller,
and loads in the callee.  The point is, that a lot of load/store
traffic around function calls disappears if you have enough registers
and smart compilers (whether or not you have windows, which of course,
can get rid of a few more).  Fast machines hate loads, because they
usually cost you stall cycles.
pay the price, you take your chances. Me, my idea of RISC is a (mostly) zero
>address architecture with 8/12 bit instructions, and four (to avoid extra
>push/pop pairs in multiplexing a single one for the up to four independent
>computations) arith stacks.
This is a fine OPINION; the current round of new computer architectures has
voted widely, and decisively, for load-store machines with "plenty"
of registers addressable at any point in the program. (plenty = usually
32, as in HP PA, MIPS, SPARC, MC88000, i860).  In particular, although
I've always admired the old B5500, it seems that zero-address architectures
are difficult to build to really go fast...
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086