Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!mcsun!ukc!dcl-cs!aber-cs!athene!pcg From: pcg@cs.aber.ac.uk (Piercarlo Grandi) Newsgroups: comp.arch Subject: Number of registers, MIPS, GNU cc Summary: GNU cc on the MIPS with 12, 16, 24, 32 nonfloat registers Message-ID: Date: 4 Jan 91 22:37:04 GMT Sender: cho@aber-cs.UUCP Organization: Coleg Prifysgol Cymru Lines: 185 Nntp-Posting-Host: teachk New Year, New Life. Here is, incredibly enough, some (little) real data. I have had some time over the past few days to make GNU CC 1.37.93 generate code under different assumptions on a MIPS machine (DECsystem 5830). GNU CC's register allocation is driven by a couple of #define's in its machine dependent header (in this case tm-mips.h). The first tells the allocator which registers it can use as scratch registers, and the second tells it which are not available across function calls. Normally a MIPS machine has 32 registers (I am mainly interested in the non floating point ones), of which 25 can be used as spare, and of these only 9 are preserved across function calls. I provided alternative definitions as follows (see source snippet at the end): name total spare preserved U12 12* 5 5 U16 16* 9 9 U24 24* 17 9 U32 32 25 9 The numbers with an asterisk don't mean that I have actually lowered the number of registers visisble to GNU cc to that level -- for convenience I have just marked the balance to 32 "unavailable". It may also be noted that the percentage of spare registers preserved by the callee rises steadly; I might have kept the proportion to about 1/3, but this was not deemed terribly important (I did check and it does not make a lot of difference). So I first compiled GNU CC (this results in three executables, gcc, cpp, and cc1) with the MIPS 'cc -O2', and called the resulting compiler version UMIPS. I then recompiled (various times, actually) the compiler itself with GNU 'cc -O2', and called the resulting version U32G32 (the executable uses 32 registers and generates code for 32 registers). I then compiled regclass.c another three times, for the U12, U16, and U24 options, and obtained another three compiler versions, U32G12, U32G16, and U32G24; all such versions are identical except for the two tables of registers. At this point I could take some benchmark and compile it with U32G12..32 and compare the results. Just to make things interesting I decided to use GNU CC itself as benchmark -- on a whim. So, I recompiled GNU CC another three times, with U32G12, U32G16 and U32G24, and I obtained three new compiler versions called U12G32, U16G32, U24G32. Note that these three compiler versions all generate code for 32 registers to make their work equivalent, but each version uses only 12, 16, 24 registers to do the compilation. Having got all these version, I wanted to look at their text sizes with: size U12G32/gcc U12G32/cpp U12G32/cc1 and here are the results: gcc cpp cc1 U12G32 37,864 90,112 733,184 U16G32 32,768 77,824 655,360 U24G32 32,768 73,728 614,400 U32G32 32,768 73,728 602,112 UMIPS 32,768 77,824 634,880 I then took a few largish source modules of GNU CC itself, and compiled them with the various compilers, with a command line like time ./gcc -BU12G32/ -S -O2 -I. -Iconfig loop.c Note that I said -O2, because all compiles so far had been done with this level of optimization (which is _broadly_ equivalent between GNU cc and MIPS cc), and -S, because I wanted to measure only the speed of gcc, cpp and cc1, not that of the MIPS as. I also measured separately the speed of cpp, for some reason. Here is a table of the user times for compiling loop.c, in seconds (with a variability of say +/- 0.5 seconds), as reported by time(1): cc1 spare normalized U12G32 39.4 5 153% U16G32 30.9 9 120% U24G32 26.4 17 103% U32G32 26.7 25 104% UMIPS 25.6 (?25) 100% Note that I have added the number of spare registers avaiable on the smae line. Compilation times for other sources, or for gcc and cpp, are more or less in the same pattern. There are a number of observations to make; some are: 1) The GNU cc is an exceptionally large 'nonfloat' benchmark; it is not the only possible one, and it is not 'typical'. 2) The GNU cc source is exceptionally intensive in fairly complicated structure manipulations, and has normally quite large function bodies. 3) Even if it has 'register' declarations here and there, the GNU cc source is written in a somewhat haphazard way and with the idea that an optimizer will sort things out. 4) The GNU cc register allocation algorithm makes no attempt to take into account the dynamic frequency of use, even if it does pay some regard to loops. It disregards by default, when optimization is enabled, all 'register' indications. 5) Statistics exist to show that almost never in any stretch of (nonfloat) code there are more than a couple dozen live variables -- the frequently used ones are usually much less than that. 6) The GNU cc is not especially tuned to the MIPS architecture; the MIPS cc is, after all, even if its roots are in some architecture dependent technology. It would be nice if anybody could run a similar test on a 68030 or 68040, or on a SPARC. In any case here is the modified bit of "tm-mips.h": /* 1 for registers that have pervasive standard uses and are not available for the register allocator. On the MIPS, see conventions, page D-2 I have chosen not to take Multiply/Divide HI,LO or PC into account. */ #ifdef REGS12 #define FIXED_REGISTERS {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\ 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1,\ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\ 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 \ } #define CALL_USED_REGISTERS {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\ 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1,\ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\ 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0\ } #endif #ifdef REGS16 #define FIXED_REGISTERS {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\ 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1,\ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 \ } #endif #ifdef REGS24 #define FIXED_REGISTERS {1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,\ 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1,\ 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,\ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 \ } #endif #ifdef REGS32 #define FIXED_REGISTERS {1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1,\ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 \ } #endif /* 1 for registers not available across function calls. These must include the FIXED_REGISTERS and also any registers that can be used without being saved. The latter must include the registers where values are returned and the register where structure-value addresses are passed. Aside from that, you can include as many other registers as you like. */ #ifndef CALL_USED_REGISTERS #define CALL_USED_REGISTERS {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\ 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1,\ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\ 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0\ } #endif -- Piercarlo Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk