Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!ames.arc.nasa.gov!lamaster From: lamaster@ames.arc.nasa.gov (Hugh LaMaster) Newsgroups: comp.arch Subject: Re: Register usage [was Re: 80486 vs. 68040 code size] Message-ID: <25254@ames.arc.nasa.gov> Date: 10 May 89 21:12:35 GMT References: <926@aber-cs.UUCP> Sender: usenet@ames.arc.nasa.gov Distribution: eunet,world Organization: NASA - Ames Research Center Lines: 67 In article <926@aber-cs.UUCP> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: >In article <25127@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes: > > 11: 2.83% (14) 11: 3.03% (15) : > the knee of the curve is. >Note that this is NOT the curve '# of regs' vs. 'code size' or 'program speed'. >It is the curve '# of regs' vs. 'max # of regs that a given optimizer can make >any use of in several procedures'. Therefore 32 registers seem to be an >UPPER BOUND on the number of registers that in the worst case may be useful. I assume that any purpose the optimizer can use the registers for is legitimate, as long as it significantly reduces memory traffic. That includes using the registers as a "cache", if you like to think of it that way. One Fortran engine (the constraints are different from C of course) - the Cyber 205, had 256 registers. It turned out that this was "too many" - often many of the registers went unused. I always assume that the limiting factor is the optimizer. The now common rule of thumb that "32 registers is enough" is based on basic block optimization in C, generally. Really smart optimizers might be able to effectively use more registers. The whole idea of "RISC" is to design around what compilers can actually do, as opposed to what they might do if they were smart enough. Code size is generally not as much of a factor, although it does affect speed indirectly (e.g. less dense code requires a bigger I-cache, etc.). The 205 optimizer assigned all non-common scalars to registers permanently. I doubt if this really saved memory traffic, because it meant that all the scalar variables got swapped in even if they were not used in that subroutine invocation. But it did make use of a fast swap, and also permitted certain register to register operations involving scalars to take place in parallel with vector operations. Therefore, it may actually have saved time. In general, a dynamic analysis would be needed to determine the optimum register assignments for each module. In the case of the 205, I would guess that around 64 registers would probably have covered almost all of the cases. Register assignments that I looked at rarely used more than 50-60. I note that the 205 calling convention did define scratch registers not saved beyond procedure calls or if I recall correctly, beyond basic block optimization (not sure about the last statement). I think 10 (?) scratch registers were allocated, and these were not always enough. 16 would have been better, at least for Fortran. > > Anyway, I wonder what the results look like for things like double >This would be interesting to see. I suspect that more registers would >be nice, but then all these codes are usually vectorizable, and then one >should use vector instructions on vector registers... I didn't know that it was REQUIRED to have a vector machine to run Fortran programs :-) >Hint: The number of scratch registers a compiler finds *useful* for >optimizing is more or less related directly to the maximum number of >subexpressions that can be computed concurrently at any one given time. In The reason I brought it up is that the analyses one usually sees in this group are based on C. Fortran codes MAY (I don't know if they do) have significantly different behavior in this respect for various reasons. Therefore, it might be worth checking to make sure that the same assumptions still apply. It might be the case that 64 32-bit registers are a better fit for the suggested Fortran examples. Hugh LaMaster, m/s 233-9, UUCP ames!lamaster NASA Ames Research Center ARPA lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 Phone: (415)694-6117