Path: utzoo!attcan!uunet!portal!cup.portal.com!bcase
From: bcase@cup.portal.com (Brian bcase Case)
Newsgroups: comp.arch
Subject: Re: Register usage [was Re: 80486 vs. 68040 code size]
Message-ID: <18235@cup.portal.com>
Date: 11 May 89 18:54:14 GMT
References: <921@aber-cs.UUCP>
Organization: The Portal System (TM)
Lines: 62

>...you need to generate code assuming that you
>have say 1 to 16 register available, and then show that as the number of
>register increases, program speed/code size improves significantly.

Well, it's old and CISCy stuff, but the paper:

Chow and Hennessey, "Register Allocation by Priority-based Coloring,"
Proc. SIGPLAN Symp. on Compiler Construction, SIGPLAN notices vol. 19,
No. 6, June 1984.

shows some performance numbers for a variable number of registers.  The
architectures were to the PDP-10 and the 68000.  A max. of 9 registers
was available.  The fastest performance was achieved when the max. number
of regs. was used.

>changing the number of registers available to its Sethi-Ullman register
>allocator, and then benchmarking a few Unix tools.
>They found that in these conditions (CISC machine, no interexpression
>optimization, virtually only fixed point computation) speed/code size did
>not improve substantially with more than three scratch registers, and four
>were plenty.

But of course!  Is this a surprise?  It isn't to me.

>I can imagine that for machines not like the 386/68020, e.g. RISC machines
>with a reg-reg architecture, more registers may be useful, but as far as I
>know there are no figures for this situation. This is an interesting
>research project: take GCC for the SPARC, and redo the exercise. Or the AMD
>29k compiler, or the MIPS compilers suite, etc...

The paper quoted above concluded that for the CISC-ish PDP-10 and 68000,
using all 9 available registers was the best (the rest were reserved for 
exclusive use by the code generator, I think).

>I still find it difficult that one would find a substantial difference
>(especially given the abundant statistics on the simplicity of the average
>expression -- expressions with more than two operators are a rarity) and
>indeed the AMD data above seem to say that seven registers is about what a
>compiler can use (for expression optimization). This, let me say, looks like
>four registers + three for local "register" variables :->.

You are forgetting, maybe?, that registers are used in clever ways to
avoid save/restore overhead on procedure calls?  The following paper:

Wall, "Global Register Allocation At Link Time," ACM SIGPLAN conf. on
Compiler Construction, June 1986 (sorry, I can get a more complete ref.
if anyone wants it).

talks about using 52 registers with link-time allocation (the machine,
the DECWRL Titan, has 64 GPRs).  The allocator tried to keep as many
procedure contexts in registers as possible.  Neat stuff.

>As to the six global registers, their contribution is hard to assess. But on
>them let me say that on one thing I agree: global "register" variables (that
>unfortunately C does not have, thus forcing the compiler to intuit them) are
>demonstrably good in one important case, when the program to which they are
>global uses them to cache the state of some automaton, e.g. an interpreter.

For the 29K, they are not used for storing data declared to be in the C
global scope. They are temporaries used for expression evaluation, etc.
If the compiler could put globally-scoped data in global registers (such
as can be done by the DECWRL "at-link-time" stuff), many more could be used.