Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!ames!eos!shelby!rutgers!bellcore!spectral!sjs From: sjs@spectral.ctt.bellcore.com (Stan Switzer) Newsgroups: comp.arch Subject: Re: Register usage Message-ID: <16052@bellcore.bellcore.com> Date: 11 May 89 13:52:27 GMT References: <921@aber-cs.UUCP> Sender: news@bellcore.bellcore.com Reply-To: sjs@ctt.bellcore.com (Stan Switzer) Distribution: eunet,world Organization: Bellcore Lines: 39 In article <921@aber-cs.UUCP> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: > The one paper I read about this (unfortunately for John Mashey I cannot > find the exact reference -- the reason is too embarassing, even if not for > me, to state publicly) was about taking the PCC (for the PDP) and changing the > number of registers available to its Sethi-Ullman register allocator, and > then benchmarking a few Unix tools. > > They found that in these conditions (CISC machine, no interexpression > optimization, virtually only fixed point computation) speed/code size did > not improve substantially with more than three scratch registers, and four > were plenty. My own experience with a C-like compiler for the Honeywell Level/6 bears this out. As long as you are simply doing expression-level code generation, three registers, surprisingly enough, are quite sufficient. The Sethi-Ullman "pebbling" scheme (a limited form of coloring) works well in this case. Larger register sets pay you back when you do dataflow analysis and allocate registers over larger spans of code than C expression statements. This is where I disagree with Piercarlo. You _can_ make good use of more registers, but you'll have to do some global analysis first. If you only have a handful of registers, though, you might as well keep it simple. The Level/6, if I remember correctly, had 14 registers: 7 integer (logical) and 7 pointer registers. Of the integer registers, numbered 1 to 7, #1-3 could be used as indexes, #1 could be used as a shift count, #6-7 could be paired to handle large (32 bit) values. Of the seven pointer only #1-3 could be used in indexed addressing addressing modes. Because of this grunginess, my compiler used the first three regs in each set as expression temps and used the rest for dedicated purposes (frame pointer, return linkage, common storage, etc.). Going beyond statement-level allocation would have probably been pointless, but a good peephole optimizer would have helped a bit. Stan Switzer sjs@ctt.bellcore.com