Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!lll-winken!uunet!mcvax!ukc!dcl-cs!aber-cs!pcg From: pcg@aber-cs.UUCP (Piercarlo Grandi) Newsgroups: comp.arch Subject: Re: Register usage [was Re: 80486 vs. 68040 code size] Summary: 'can use' is not the same as 'can efficiently use' Message-ID: <921@aber-cs.UUCP> Date: 9 May 89 13:58:56 GMT Reply-To: pcg@cs.aber.ac.uk (Piercarlo Grandi) Distribution: eunet,world Organization: Dept of CS, UCW Aberystwyth (Disclaimer: my statements are purely personal) Lines: 65 In article <25546@amdcad.AMD.COM> tim@amd.com (Tim Olson) writes: | Highly optimizing compilers have long been able to make very | good use of more than four registers. [ .... ] OK, here are some figures to play with: [ .... ] A static analysis of 495 functions shows that an average of 6.6 global ^^^^^^ registers and an average of 7.0 local registers are used per function, ^^^^ with the following register-use histogram: (ain't 'awk' wonderful? ;-) Too bad that these figures don't mean anything, except that your compiler can 'make use of more than four registers'. The 'very good' after 'make' is not proved at all. To prove that you need to generate code assuming that you have say 1 to 16 register available, and then show that as the number of register increases, program speed/code size improves significantly. The one paper I read about this (unfortunately for John Mashey I cannot find the exact reference -- the reason is too embarassing, even if not for me, to state publicly) was about taking the PCC (for the PDP) and changing the number of registers available to its Sethi-Ullman register allocator, and then benchmarking a few Unix tools. They found that in these conditions (CISC machine, no interexpression optimization, virtually only fixed point computation) speed/code size did not improve substantially with more than three scratch registers, and four were plenty. I can imagine that for machines not like the 386/68020, e.g. RISC machines with a reg-reg architecture, more registers may be useful, but as far as I know there are no figures for this situation. This is an interesting research project: take GCC for the SPARC, and redo the exercise. Or the AMD 29k compiler, or the MIPS compilers suite, etc... I still find it difficult that one would find a substantial difference (especially given the abundant statistics on the simplicity of the average expression -- expressions with more than two operators are a rarity) and indeed the AMD data above seem to say that seven registers is about what a compiler can use (for expression optimization). This, let me say, looks like four registers + three for local "register" variables :->. As to the six global registers, their contribution is hard to assess. But on them let me say that on one thing I agree: global "register" variables (that unfortunately C does not have, thus forcing the compiler to intuit them) are demonstrably good in one important case, when the program to which they are global uses them to cache the state of some automaton, e.g. an interpreter. In the end, I see here nothing against the idea that 16 total registers is plenty and 8 adequate if a bit constraining, but the difference is not going to be great... All the more so in the original context, CISC reg-mem machines and using as metric code size. As to me, I am much fond of zero address architectures, with the tip of the stack (let's make it four tips of stack) cached. I like CRISP architectures... :-> :->. -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk