Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!amdcad!crackle!tim
From: tim@crackle.amd.com (Tim Olson)
Newsgroups: comp.arch
Subject: Register usage [was Re: 80486 vs. 68040 code size]
Message-ID: <25546@amdcad.AMD.COM>
Date: 8 May 89 02:08:05 GMT
References: <907@aber-cs.UUCP>
Sender: news@amdcad.AMD.COM
Reply-To: tim@amd.com (Tim Olson)
Distribution: eunet,world
Organization: Advanced Micro Devices, Inc. Sunnyvale CA
Lines: 80
Summary:
Expires:
Sender:
Followup-To:

In article <907@aber-cs.UUCP> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
| In article <10979@polyslo.CalPoly.EDU> cquenel@polyslo.CalPoly.EDU writes:
|     
|     	Highly optimizing compilers have long been able to make very
|     	good use of more than four registers.
| 
| Well, in a long discussion a few months ago in comp.lang.c, nobody has been
| able (in a period of two months) to quote any FIGURES that support this and
| other urban legends (e.g. Elvis is alive and designed the Z80000 :->). There
| are plenty of figures though about the average extreme simplicity of actual
| statements/expressions/constructs in algorithmic level languages, which does
| not bode well for the usefulness of large register files.

OK, here are some figures to play with:

The Am29000 calling convention says that global temporary registers are
killed across a procedure call, while values in local registers remain
alive.  When the Am29000 compilers perform lifetime analysis for
register assignment, they note which values must be alive across a
procedure call and assign those to local registers; the others are
assigned to global registers.

A static analysis of 495 functions shows that an average of 6.6 global
registers and an average of 7.0 local registers are used per function,
with the following register-use histogram: (ain't 'awk' wonderful? ;-)

	495 total functions
	ave globals: 6.5596     ave locals: 7.04242
	--- globals ---         --- locals ---
	  0:  1.82% (9)           0:  1.01% (5)
	  1:  0.81% (4)           1:  5.66% (28)
	  2:  7.27% (36)          2:  9.29% (46)
	  3:  4.24% (21)          3: 13.94% (69)
	  4: 10.51% (52)          4: 14.95% (74)
	  5:  9.90% (49)          5:  8.08% (40)
	  6: 17.58% (87)          6: 10.71% (53)
	  7: 15.56% (77)          7:  7.47% (37)
	  8:  8.08% (40)          8:  3.84% (19)
	  9: 11.72% (58)          9:  3.84% (19)
	 10:  4.85% (24)         10:  4.85% (24)
	 11:  2.83% (14)         11:  3.03% (15)
	 12:  1.41% (7)          12:  1.01% (5)
	 13:  0.40% (2)          13:  1.01% (5)
	 14:  1.21% (6)          14:  1.41% (7)
	 15:  0.00% (0)          15:  0.81% (4)
	 16:  0.81% (4)          16:  0.40% (2)
	 17:  0.20% (1)          17:  0.81% (4)
	 18:  0.00% (0)          18:  0.40% (2)
	 19:  0.20% (1)          19:  1.41% (7)
	 20:  0.40% (2)          20:  0.61% (3)
	 21:  0.00% (0)          21:  1.62% (8)
	 22:  0.00% (0)          22:  0.61% (3)
	 23:  0.00% (0)          23:  0.20% (1)
	 24:  0.20% (1)          24:  0.00% (0)
	>24:  0.00% (0)         >24:  3.03% (15)

With a stack cache, the cost of saving and restoring the live registers
across procedure calls is constant up to the spill/fill boundary, while
with a simple global register file, the cost depends upon the number of
live registers (1 store + 1 load per live register).

If we assume an average of 7 live registers and 1.5% to 2.5% calls in
the dynamic instruction mix, live register save and restore would
account for an extra 21% ([7 stores + 7 loads]/66 instructions between
calls) to 35% ([7 stores + 7 loads]/40 instructions between calls) of
execution time, assuming that loads and stores take 1 cycle.  However,
leaf-procedure optimizations (which we also perform in the current
Am29000 compilers) can reduce the number of calls which must save live
registers by perhaps 30%, resulting in an overall execution time
increase of from 14.7% to 24.5%.

Now The Am29000 compilers perform CSE at a very low level (we assume
that registers are cheap, loads and stores expensive), so we probably
have a somewhat higher number of live registers across function calls
than other architectures.  I'm sure someone at MIPS can supply us with
their equivalent numbers.

	-- Tim Olson
	Advanced Micro Devices
	(tim@amd.com)