Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!lll-winken!uunet!mcvax!ukc!dcl-cs!aber-cs!pcg
From: pcg@aber-cs.UUCP (Piercarlo Grandi)
Newsgroups: comp.arch
Subject: Re: Register usage [was Re: 80486 vs. 68040 code size]
Summary: 'can use' is not the same as 'can efficiently use'
Message-ID: <921@aber-cs.UUCP>
Date: 9 May 89 13:58:56 GMT
Reply-To: pcg@cs.aber.ac.uk (Piercarlo Grandi)
Distribution: eunet,world
Organization: Dept of CS, UCW Aberystwyth
	(Disclaimer: my statements are purely personal)
Lines: 65

In article <25546@amdcad.AMD.COM> tim@amd.com (Tim Olson) writes:

    |     	Highly optimizing compilers have long been able to make very
    |     	good use of more than four registers.

	[ .... ]
    
    OK, here are some figures to play with:

	[ .... ]
    
    A static analysis of 495 functions shows that an average of 6.6 global
      ^^^^^^
    registers and an average of 7.0 local registers are used per function,
							^^^^
    with the following register-use histogram: (ain't 'awk' wonderful? ;-)

Too bad that these figures don't mean anything, except that your compiler
can 'make use of more than four registers'. The 'very good' after 'make' is
not proved at all. To prove that you need to generate code assuming that you
have say 1 to 16 register available, and then show that as the number of
register increases, program speed/code size improves significantly.

The one paper I read about this (unfortunately for John Mashey I cannot
find the exact reference -- the reason is too embarassing, even if not for
me, to state publicly) was about taking the PCC (for the PDP) and changing the
number of registers available to its Sethi-Ullman register allocator, and
then benchmarking a few Unix tools.

They found that in these conditions (CISC machine, no interexpression
optimization, virtually only fixed point computation) speed/code size did
not improve substantially with more than three scratch registers, and four
were plenty.

I can imagine that for machines not like the 386/68020, e.g. RISC machines
with a reg-reg architecture, more registers may be useful, but as far as I
know there are no figures for this situation. This is an interesting
research project: take GCC for the SPARC, and redo the exercise. Or the AMD
29k compiler, or the MIPS compilers suite, etc...

I still find it difficult that one would find a substantial difference
(especially given the abundant statistics on the simplicity of the average
expression -- expressions with more than two operators are a rarity) and
indeed the AMD data above seem to say that seven registers is about what a
compiler can use (for expression optimization). This, let me say, looks like
four registers + three for local "register" variables :->.

As to the six global registers, their contribution is hard to assess. But on
them let me say that on one thing I agree: global "register" variables (that
unfortunately C does not have, thus forcing the compiler to intuit them) are
demonstrably good in one important case, when the program to which they are
global uses them to cache the state of some automaton, e.g. an interpreter.

In the end, I see here nothing against the idea that 16 total registers is
plenty and 8 adequate if a bit constraining, but the difference is not going
to be great... All the more so in the original context, CISC reg-mem machines
and using as metric code size.

As to me, I am much fond of zero address architectures, with the tip of the
stack (let's make it four tips of stack) cached. I like CRISP architectures...
:-> :->.
-- 
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk