Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!pasteur!ames!oliveb!apple!versatc!mips!mash From: mash@mips.COM (John Mashey) Newsgroups: comp.arch Subject: Re: 80486 vs. 68040 code size [really: how many regs] Message-ID: <19063@winchester.mips.COM> Date: 8 May 89 23:19:33 GMT References: <907@aber-cs.UUCP> Reply-To: mash@mips.COM (John Mashey) Distribution: eunet,world Organization: MIPS Computer Systems, Sunnyvale, CA Lines: 86 In article <907@aber-cs.UUCP> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: .... >Well, in a long discussion a few months ago in comp.lang.c, nobody has been >able (in a period of two months) to quote any FIGURES that support this and >other urban legends (e.g. Elvis is alive and designed the Z80000 :->). There >are plenty of figures though about the average extreme simplicity of actual >statements/expressions/constructs in algorithmic level languages, which does >not bode well for the usefulness of large register files. The simplicity of source statements has little to do with the number of registers desirable, unless the only compiler your have generates code on a statement-by-statement basis only, i.e., no optimization. For example, consider a typical RISC (i.e., load/store), and the C stmts: a = b + 5; c = b + 7; If you generate code for each statement at a time, and don't do register allocation, you need just one register: load r1,b; add r1,r1,5; store r1,a load r1,b; add r1,r1,7; store r1,c Some simple inter-statement optimization would want 2 regs: load r1,b; add r2,r1,5; store r2,a add r2,r1,7; store r2,c And if you were in the middle of code that had more references to these, and had allocated a: r2, b: r1, c: r3: you now have 3 regs add r2,r1,5; add r3,r1,7 and if these statements were in the middle of a loop someplace where the optimizer had already identified these as useful expressions to have around, you could actually get 5 regs used (a,b,c,b+5,b+7), although the resulting code for these statements might then look like: (nothing: later references to a or c would reference the regs where b+5 and b+7 were stored) >Even some of the RISC guys, and some that do a great deal of inter-expression >register assignments (which I don't like at all -- long life "register"!), >like MIPS, don't see a great advantage in extravagant register file sizes. Why don't you like inter-expression register assignments? > If John Mashey could give us some of the numbers that MIPS have > surely got on where they estimate the knees of the curves for > registers for intra and inter statement optimization to be, we would > be greatly enlightened (e.g. as to why their register file is so > different from that of SPARC and 29K); maybe he has already, and I > have missed them. A few years ago, we did the experiment of running the number of registers up and down to see what happened. For our machine, for our compilers, for whichever benchmarks we did (large programs, but I don't recall which), the knee of the curve was in the 24-28 range, for generally-allocatable registers. Both HP and IBM found the same range in independent studies, although, I don't think this is published anywhere, it being the kind of data obtained in bars arguing over architecture. We haven't kept such analyses around, having already made the decisions relevant thereto. However, I would observe that I've looked at tons of object code, and the registers get used. Note that they are useful in two distinct ways: 1) To evaluate expressions, including global optimizations. 2) To have enough scratch registers that many functions need 0 (leaf) or 1 register, unless the optimizer decides it's really worth having a bunch of registers. Note that if you only have X registers available, and you generally need approx. X to do reasonable expression evaluation, you must save/restore a healthy percentage of X registers across function calls, or go completely to callee-save. Most people with this kind of architecture have found it best to split the registers between callee-save and caller-save. In our case, we save about 1.6 regs/average function call, across wide range of benchmarks, and that is due to having ENOUGH registers to allow both safe and scratch registers, and still have enough scratch registers to do plenty of evaluation. Note that 2) is a subtle issue, easily overlooked; but is very important, especially in the "register-window vs non-register-window" wars. >.... >As to some small bit of available evidence, it is not very controversial >that the 386 is usually a tad faster than the 68020 with roughly equivalent >system technology (e.g. a cached 386 at 20Mhz tipically beats a cached 68020 >at 25 Mhz under gcc), and code size is a tad smaller as well. This may not >mean much, may not be just *because* it has less registers, but it seems to >indicate that at least the smaller number of registers does not hurt too >much. I'm not sure I necessarily believe the relative performance claim; in any case, I would bet the biggest difference is attributable to the 2-cycle bus access (386) versus 3-cycle access (68020). -- -john mashey DISCLAIMER: UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086