Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!cs.utexas.edu!uunet!mcvax!ukc!dcl-cs!aber-cs!pcg
From: pcg@aber-cs.UUCP (Piercarlo Grandi)
Newsgroups: comp.arch
Subject: Re: Register usage [was Re: 80486 vs. 68040 code size]
Summary: Knee of which curve? :->
Message-ID: <926@aber-cs.UUCP>
Date: 9 May 89 22:40:28 GMT
Reply-To: pcg@cs.aber.ac.uk (Piercarlo Grandi)
Distribution: eunet,world
Organization: Dept of CS, UCW Aberystwyth
	(Disclaimer: my statements are purely personal)
Lines: 36

In article <25127@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes:
    
    >A static analysis of 495 functions shows that an average of 6.6 global
    >registers and an average of 7.0 local registers are used per function,
    :
    >	 11:  2.83% (14)         11:  3.03% (15)
    :
    
    An interesting set of figures.  Using 32 general purpose registers, with
    16 for local and 16 for temporaries, would certainly seem to fit, given where
    the knee of the curve is.

Note that this is NOT the curve '# of regs' vs. 'code size' or 'program speed'.
It is the curve '# of regs' vs. 'max # of regs that a given optimizer can make
any use of in several procedures'. Therefore 32 registers seem to be an
UPPER BOUND on the number of registers that in the worst case may be useful.
    
    Anyway, I wonder what the results look like for things like double
    precision: Linpack, the Livermore Loops, the NAS kernels, etc.
    (In other words, 64 bit floating point numeric codes...)  ...?

This would be interesting to see. I suspect that more registers would
be nice, but then all these codes are usually vectorizable, and then one
should use vector instructions on vector registers...

Hint: The number of scratch registers a compiler finds *useful* for
optimizing is more or less related directly to the maximum number of
subexpressions that can be computed concurrently at any one given time.  In
other words, if four is that number, that means that in the tipical
statement/expression data dependencies are such that at most four
subexpressions could be concurrently computed. In normal programs, it is
hard to see how this implicit degree of concurrency could be raised much.
-- 
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk