Path: utzoo!attcan!uunet!mcvax!ukc!dcl-cs!aber-cs!pcg From: pcg@aber-cs.UUCP (Piercarlo Grandi) Newsgroups: comp.arch Subject: Re: Register usage Summary: 8 global registers adequate on RISC -- more needed if static analysis done Message-ID: <952@aber-cs.UUCP> Date: 15 May 89 18:48:52 GMT Reply-To: pcg@cs.aber.ac.uk (Piercarlo Grandi) Distribution: eunet,world Organization: Dept of CS, UCW Aberystwyth (Disclaimer: my statements are purely personal) Lines: 109 In article <8905110956.AA12655@decwrl.dec.com> neideck@nestvx.dec.com (Burkhard Neidecker-Lutz) writes: Summary: If your compiler is not antiquated (compared to this one, even gcc looks bad), you can use gazillions of registers. The speedup is on the order of 30%, so don't hold your breath. Well, "gazillions" is a bit exxagerated, but 30% is not bad improvement; the curve though has a very mild slope at the end, so the knee is fairly early. Read on... The compiler used was for the not-so-widely-known DECWRL Titan RISC machine, a ECL RISC with 64 non-windowed registers and a single cylce load. The compiler does global register allocation (yes, global variables in the sense of C) at link time and is a common backend for C, Fortran and Modula 2. If I remember correctly, the decwrl machine is a titan, and is the favourite plaything of Paul Vixie :-) :-). The number of registers does not apply to the expression evaluation and address generation registers (this seems to be the 4-7 people have been talking so far) I am not surprised... but those used by the optimizer to hold things. It analyzes all scalar variables into non-conflicting groups and tries to allocate those to registers. From the introduction: "When we use our method for 52 registers, our benchmarks speed up by 10 to 25 %. Even with only 8 registers, the speedup can be nearly that large, if we use previously collected profile information to guide the allocation. I thank you enormously. This quote greatly cheers me up. This is a very good support for my position on "register" (where a competent programmer is expected to do this, with great simplication of the compiler, because compilers cannot possibly know which variables are the most frequently used dynamically). We cannot do much better, because programs whose variables all fit in registers rarely speed up by more than 30 %. Excellent. I can surmise that this is simply because there not that many hot spots in a program... [ ... many interesting numbers ... ] This shows that his scheme is very efficient in removing these memory references. Please note that given the enormous "hit rate" he has and given the not so impressive speedups he got the overall precentage of scalar memory references cannot be that big versus accesses to bigger data structures. I am not surprised either. it is the old rule of hot spots on one hand, and of the desirability of using vector architrctures for vector code... Now the interesting tables. What happens if you use fewer registers ? The following table shows the speed improvements with 52, 32 and 8 registers. All of these performance measures are the relative improvement the programs took with global register allocation guided by profile information relative to "naive register allocation". 52 32 8 ----------------------------- Livermore 19 % 18 % 12 % Whetstone 10 % 10 % 5 % Linpack 13 % 13 % 10 % Stanford 28 % 27 % 20 % Simulator 16 % 15 % 8 % Verifier 19 % 16 % 7 % This does not surprise me either. I would go of course for 8 registers.... Note that these 8 global registers chosen using profiling probably would be more than plenty instead of just adequate on a reg-mem machine... I remain solid in my reckoning that for a reg-mem machine 8 registers overall is just adequate, and 16 is plenty, with some increment required for reg-reg. There is another very interesting paper by David comparing register window schemes of varying organization with this global allocation stuff and this seems to suggest that a slightly bigger global register file beats register windows The 29k guys will cheer... if you are willing to use this extremely advanced compilation techniques. Or the "register" keyword, and you are a competent programmer. Possibly extended to global variables... The paper appeared in Proc. of the SIGPLAN 1988 Conference on Programming Language Design and Implementation, June 1988. The papers title is "Register Windows vs. Register Allocation". It's way to long to reproduce here and the graphics in there are much nicer than anything I can type here. It is very good indeed. But the question is always of course whether a statically allocated large register file is can still be called a register file, and not rather a first level memory; if it is so large that you can store essentially all the variables into it for virtually all of the program (i.e. use it part as data and part as stack), then I beg to submit that you have a two level mem-mem architecture. Maybe the as AMD 29,000 has a couple hundred registers, the AMD 290,000 will have two thousand, and paging/swapping of registers, etc... :-). -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk