Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!cs.utexas.edu!uunet!mcvax!ukc!dcl-cs!aber-cs!pcg
From: pcg@aber-cs.UUCP (Piercarlo Grandi)
Newsgroups: comp.arch
Subject: Re: 80486 vs. 68040 code size [really: how many regs]
Message-ID: <927@aber-cs.UUCP>
Date: 9 May 89 23:13:39 GMT
Reply-To: pcg@cs.aber.ac.uk (Piercarlo Grandi)
Distribution: eunet,world
Organization: Dept of CS, UCW Aberystwyth
	(Disclaimer: my statements are purely personal)
Lines: 101

In article <19063@winchester.mips.COM> mash@mips.COM (John Mashey) writes:
    
    The simplicity of source statements has little to do with the number of
    registers desirable, unless the only compiler your have generates code
    on a statement-by-statement basis only, i.e., no optimization.
					    ^^^^^^^^^^^^^^^^^^^^^
Optimization is not just (and maybe even not most importantly) inter
statement...

    For example, consider a typical RISC (i.e., load/store), and the C stmts:
    	a = b + 5;
    	c = b + 7;
	[ .... ]

Your example works, but under special case assumptions: that you are working on
a reg-reg architecture, whereas we were discussing reg-mem ones; that putting
all three a,b,c in registers is worthwhile because they are going to be used
heavvily in other parts of the program.

The reg-reg assumption actually may point at one of their weaknesses, that
since the cost of computing with parts of your operands in memory has a high
fixed cost, you tend to want to store everything in regs, even if they are
used little. In a reg-mem architecture little use variables in memory do not
carry costs as high when you use them.
    	
    Why don't you like inter-expression register assignments?

Well, I like them, as long as the compiler does not do them, but the
programmer does, by using explicit "register" declarations. But let's not
resurrect the comp.lang.c debate here, and not now (it will restart in
comp.lang.c, as soon I can reload my notes from then... :-/ :-/).
    
    A few years ago, we did the experiment of running the number of registers
    up and down to see what happened.  For our machine, for our compilers,
    for whichever benchmarks we did (large programs, but I don't recall which),
    the knee of the curve was in the 24-28 range, for generally-allocatable
    registers.  Both HP and IBM found the same range in independent studies,
    [ .... ]

Uh? This really astonishes me. I would have bet that even for a RISC, even
doing inter statement optimization, the number was about 8-12 rather than 4
(rationale: 4 scratches + 4 for register variables automatically chosen by
the compiler+4 for RISC'iness at most).

    However, I would observe that I've looked at tons of object code, and
    the registers get used.

Disclaimer: I have only worked extensively on reg-mem machines so far. For
such machines I beg to differ; my impression is that for intra statement
optimization four scratch regs is enough, and for inter statement
optimization ("register" variables) another four is enough. Hence my hunch
that 8 (386), or 3+3 (plus 2 for system work) is a bit tight, but still
tolerable, and 16 (68020) is even overabundant.

    Note that they are useful in two distinct ways:

    	1) To evaluate expressions, including global optimizations.

Conceded (as long as the global optimizations are done by the programmer,
or, ahem, are implicit in suitably designed language constructs).

    	2) To have enough scratch registers that many functions need
    	0 (leaf) or 1 register, unless the optimizer decides it's really
    	worth having a bunch of registers.  Note that if you only have
    	X registers available, and you generally need approx. X to do
    	reasonable expression evaluation, you must save/restore a healthy
    	percentage of X registers across function calls, or go completely
    	to callee-save.  Most people with this kind of architecture
    	have found it best to split the registers between callee-save
    	and caller-save.  In our case, we save about 1.6 regs/average
    	function call, across wide range of benchmarks, and that is due to
    	having ENOUGH registers to allow both safe and scratch registers,
    	and still have enough scratch registers to do plenty of evaluation.

    Note that 2) is a subtle issue, easily overlooked; but is very important,
    especially in the "register-window vs non-register-window" wars.

Ahhhhhhhh. What you are saying is that you are using registers as a
statically allocated cache, and that this is good not because they are
frequently used, but because they would otherwise be frequently
saved/restored... Well, well, well. If you want a reg-reg architecture, you
pay the price, you take your chances. Me, my idea of RISC is a (mostly) zero
address architecture with 8/12 bit instructions, and four (to avoid extra
push/pop pairs in multiplexing a single one for the up to four independent
computations) arith stacks.

Note that It is assumed that RISC == reg-reg, and that load-store == reg-reg;
neither these equations are necessarily true, as one could have RISC ==
stack-stack or load-store == stack-stack... 
    
	[ ... me saying that the 386 is faster than 68020 at same Mhz ... ]
    I'm not sure I necessarily believe the relative performance claim;

Well, I admit I exxxagerated a bit :->; e.g., while I get about 5% more
Dhrystones from my home 386@20Mhz than from the Sun3/280@25Mhz at work,
the difference is not very significant... I would reckon that overall the
386 is (conservatively) 10-15% faster than the 68020 at the same Mhz.
-- 
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk