Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!cs.utexas.edu!uunet!mcvax!ukc!dcl-cs!aber-cs!pcg From: pcg@aber-cs.UUCP (Piercarlo Grandi) Newsgroups: comp.arch Subject: Re: 80486 vs. 68040 code size [really: how many regs] Message-ID: <927@aber-cs.UUCP> Date: 9 May 89 23:13:39 GMT Reply-To: pcg@cs.aber.ac.uk (Piercarlo Grandi) Distribution: eunet,world Organization: Dept of CS, UCW Aberystwyth (Disclaimer: my statements are purely personal) Lines: 101 In article <19063@winchester.mips.COM> mash@mips.COM (John Mashey) writes: The simplicity of source statements has little to do with the number of registers desirable, unless the only compiler your have generates code on a statement-by-statement basis only, i.e., no optimization. ^^^^^^^^^^^^^^^^^^^^^ Optimization is not just (and maybe even not most importantly) inter statement... For example, consider a typical RISC (i.e., load/store), and the C stmts: a = b + 5; c = b + 7; [ .... ] Your example works, but under special case assumptions: that you are working on a reg-reg architecture, whereas we were discussing reg-mem ones; that putting all three a,b,c in registers is worthwhile because they are going to be used heavvily in other parts of the program. The reg-reg assumption actually may point at one of their weaknesses, that since the cost of computing with parts of your operands in memory has a high fixed cost, you tend to want to store everything in regs, even if they are used little. In a reg-mem architecture little use variables in memory do not carry costs as high when you use them. Why don't you like inter-expression register assignments? Well, I like them, as long as the compiler does not do them, but the programmer does, by using explicit "register" declarations. But let's not resurrect the comp.lang.c debate here, and not now (it will restart in comp.lang.c, as soon I can reload my notes from then... :-/ :-/). A few years ago, we did the experiment of running the number of registers up and down to see what happened. For our machine, for our compilers, for whichever benchmarks we did (large programs, but I don't recall which), the knee of the curve was in the 24-28 range, for generally-allocatable registers. Both HP and IBM found the same range in independent studies, [ .... ] Uh? This really astonishes me. I would have bet that even for a RISC, even doing inter statement optimization, the number was about 8-12 rather than 4 (rationale: 4 scratches + 4 for register variables automatically chosen by the compiler+4 for RISC'iness at most). However, I would observe that I've looked at tons of object code, and the registers get used. Disclaimer: I have only worked extensively on reg-mem machines so far. For such machines I beg to differ; my impression is that for intra statement optimization four scratch regs is enough, and for inter statement optimization ("register" variables) another four is enough. Hence my hunch that 8 (386), or 3+3 (plus 2 for system work) is a bit tight, but still tolerable, and 16 (68020) is even overabundant. Note that they are useful in two distinct ways: 1) To evaluate expressions, including global optimizations. Conceded (as long as the global optimizations are done by the programmer, or, ahem, are implicit in suitably designed language constructs). 2) To have enough scratch registers that many functions need 0 (leaf) or 1 register, unless the optimizer decides it's really worth having a bunch of registers. Note that if you only have X registers available, and you generally need approx. X to do reasonable expression evaluation, you must save/restore a healthy percentage of X registers across function calls, or go completely to callee-save. Most people with this kind of architecture have found it best to split the registers between callee-save and caller-save. In our case, we save about 1.6 regs/average function call, across wide range of benchmarks, and that is due to having ENOUGH registers to allow both safe and scratch registers, and still have enough scratch registers to do plenty of evaluation. Note that 2) is a subtle issue, easily overlooked; but is very important, especially in the "register-window vs non-register-window" wars. Ahhhhhhhh. What you are saying is that you are using registers as a statically allocated cache, and that this is good not because they are frequently used, but because they would otherwise be frequently saved/restored... Well, well, well. If you want a reg-reg architecture, you pay the price, you take your chances. Me, my idea of RISC is a (mostly) zero address architecture with 8/12 bit instructions, and four (to avoid extra push/pop pairs in multiplexing a single one for the up to four independent computations) arith stacks. Note that It is assumed that RISC == reg-reg, and that load-store == reg-reg; neither these equations are necessarily true, as one could have RISC == stack-stack or load-store == stack-stack... [ ... me saying that the 386 is faster than 68020 at same Mhz ... ] I'm not sure I necessarily believe the relative performance claim; Well, I admit I exxxagerated a bit :->; e.g., while I get about 5% more Dhrystones from my home 386@20Mhz than from the Sun3/280@25Mhz at work, the difference is not very significant... I would reckon that overall the 386 is (conservatively) 10-15% faster than the 68020 at the same Mhz. -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk