Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!cornell!uw-beaver!rice!titan!preston From: preston@titan.rice.edu (Preston Briggs) Newsgroups: comp.arch Subject: Re: 80486 vs. 68040 code size [really: how many regs] Message-ID: <3249@kalliope.rice.edu> Date: 11 May 89 19:03:19 GMT References: <927@aber-cs.UUCP> Sender: usenet@rice.edu Reply-To: preston@titan.rice.edu (Preston Briggs) Distribution: eunet,world Organization: Rice University, Houston Lines: 156 Why I like registers 1) My code generators have three parts: instruction selection, register allocation, and instruction scheduling. These are all difficult problems; each is NP-Complete. Additionally, they interfere with each other. The best instruction depends on the register class of the operands, pipeline scheduling increases register lifetimes, and so forth. Lots of registers provides a simplifying separation of concerns. With adequate registers: I can choose instructions without without worrying about running out of a particular class; I can schedule instructions before register allocation without the artificial anti-dependences introduced by register allocation and without worrying about eccessively long register lifetimes; and I can use some sort of global (intra-routine) coloring allocator to avoid ad hoc local methods. 2) The above arguments generally apply to integer registers. I also like lots of FP registers. These are harder to use; generally FP values are hidden in arrays where optimizers can't safely get at them. By using (more expensive) dependence-based optimization, a variety of transformation can be applied to use (very profitably) almost any number of FP registers, particularly with "typical, numeric Fortran." From local experiments, we want more than 16 FP registers, perhaps 32 is adequate. For examples, see "Estimating and improving interlock for pipelined architectures" by Callahan, Cocke, and Kennedy. Proceedings of the 1987 International Conference on Parallel Processing, August 87. Or "Compiling C for Vectorization, Parallelization, and Inline Expansion" by Allen and Johnson, SIGPLAN 88. Or (more to the point, but hard to get?) "Why even scalar machines need vector compilers" by Allen and Lew, TR, Ardent Computer, January 88. 3) An experiment. An integer Fortran(!) program. A non-recursive version of quicksort. An experimental, optimizing compiler for the IBM RT (16 integer registers, 1 is stack pointer). Compiler does partial redundancy elimination (global common subexpressions and loop invarients), value numbering over extended basic blocks, and dead code elimination. (No strength reduction or global constant propagation). Uses a Chaitin-style, graph coloring, global register allocator. With 16 registers, it sorts 200,000 integers in 8.2 seconds. regs spills run-time obj-size --------------------------------------------------------- 16 3 live ranges 8.2 seconds 360 bytes 14 5 8.3 368 12 8 8.7 400 10 13 10.0 440 8 17 13.2 464 Some caveats, I only tried this one integer program (as a part of another study), maybe our register allocator isn't the best for just a few registers, and so forth... On the other hand, our allocator does a good job and lots of FP intensive programs suggest that we could often effectively use many more than 16 integer registers. Finally, let's talk about optimization. Generally, optimization competes with register allocation -- optimization lengthens live ranges and increases register pressure. I think that optimization by the compiler vs. the programmer is a Good Thing. It lets the programmer write good code more quickly and compactly. For example, consider the simple C statements for (i=0; i