Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!cs.utexas.edu!uunet!mcvax!ukc!dcl-cs!aber-cs!pcg From: pcg@aber-cs.UUCP (Piercarlo Grandi) Newsgroups: comp.arch Subject: Re: Register usage [was Re: 80486 vs. 68040 code size] Summary: Knee of which curve? :-> Message-ID: <926@aber-cs.UUCP> Date: 9 May 89 22:40:28 GMT Reply-To: pcg@cs.aber.ac.uk (Piercarlo Grandi) Distribution: eunet,world Organization: Dept of CS, UCW Aberystwyth (Disclaimer: my statements are purely personal) Lines: 36 In article <25127@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes: >A static analysis of 495 functions shows that an average of 6.6 global >registers and an average of 7.0 local registers are used per function, : > 11: 2.83% (14) 11: 3.03% (15) : An interesting set of figures. Using 32 general purpose registers, with 16 for local and 16 for temporaries, would certainly seem to fit, given where the knee of the curve is. Note that this is NOT the curve '# of regs' vs. 'code size' or 'program speed'. It is the curve '# of regs' vs. 'max # of regs that a given optimizer can make any use of in several procedures'. Therefore 32 registers seem to be an UPPER BOUND on the number of registers that in the worst case may be useful. Anyway, I wonder what the results look like for things like double precision: Linpack, the Livermore Loops, the NAS kernels, etc. (In other words, 64 bit floating point numeric codes...) ...? This would be interesting to see. I suspect that more registers would be nice, but then all these codes are usually vectorizable, and then one should use vector instructions on vector registers... Hint: The number of scratch registers a compiler finds *useful* for optimizing is more or less related directly to the maximum number of subexpressions that can be computed concurrently at any one given time. In other words, if four is that number, that means that in the tipical statement/expression data dependencies are such that at most four subexpressions could be concurrently computed. In normal programs, it is hard to see how this implicit degree of concurrency could be raised much. -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk