Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!samsung!think!mintaka!mit-eddie!uw-beaver!uw-june!pardo From: pardo@cs.washington.edu (David Keppel) Newsgroups: comp.sys.m88k Subject: Re: Register Allocation (was Re: Info about 88open & standards) Summary: long Message-ID: <10007@june.cs.washington.edu> Date: 30 Nov 89 18:25:18 GMT References: <1989Nov16.212149.9770@paris.ics.uci.edu> <100050002@hpcuhc.HP.COM> <9972@june.cs.washington.edu> <7263@sybase.sybase.com> Reply-To: pardo@june.cs.washington.edu (David Keppel) Distribution: na Organization: University of Washington, Computer Science, Seattle Lines: 113 > pardo@june.cs.washington.edu (David Keppel) writes: >>On the RISC-family processors, you'll probably >>lose a lot. Is it worth adding save-by-mask hardware? Probaly not. In article <7263@sybase.sybase.com> tim@binky.UUCP (Tim Wood) writes: >Please subtantiate these assertions. A couple other people have already; allow me to summarize. A RISC has [is defined as having] a load/store architecture, one instruction issued per clock cycle. Save-by-mask in hardware is non-RISC. Implementing save-by-mask in software requires several instructions to be executed for each (potential) save. Between the blowup in code size and the extra instructions, you're probably better off blindly saving n registers. Remember that conditionally saved registers must also be conditionally restored at function return. A (long) example Suppose callee saves 8, caller saves 8. The compiler will leave caller saves registers empty (don't need saving) aroudn function calls. The caller could thus needlessly save up to 8 registers. If the callee needs registers it will try to use registers that were saved by the callee. If that fails it will save up to 8 registers on its own, but those registers might be dead in the caller, so up to 8 registers could be saved needlessly. The wasted caller-saves and wasted callee-saves can never overlap. At most 8 registers are saved needlessly. If load/store is pipelined, then the worst-case code looks like, say add sp 32 sp store r1 0[sp] store r2 4[sp] store r3 8[sp] store r4 12[sp] store r5 16[sp] store r6 20[sp] store r7 24[sp] store r8 28[sp] call nop load 28[sp] r8 ... subl sp 32 sp That's 19 clock ticks wasted. The symmetric case for callee saves similarly wastes 19 clock ticks. Now consider save-by-mask [r1 to r16 can are saved by mask.] [mask passed in rN, shadow is rM.] [Destination is rightmost operand.] [No branch delay slots.] function: jmp !mask done r1: and mask 1 shadow jmp !shadow r2 save r1 r2: and mask 2 shadow jmp !shadow r2 save r2 jmp !mask done r3: ... r16: and mask 2*^15 shadow jmp !shadow done save r16 done: (Note: you may well be able to do better than this. This is an example of one sequence for one hypothetical machine.) The worst case is 16*3 = 48 instructions, but if jumps introduce more than one bubble, then it could blow up to about 16*4 = 64 instructions. Let's go with 48. On average you should have to save about half the registers (a guess). A clever optimzier will put registers in low-number registers, so that you don't generally have to go all the way to r16 in order to save half the registers. So on average you'll hit about 24 instructions in the prolog and about 24 instructions in the epilog, or about 48 instructions. Of course the register saves are highly machine- and application-dependent. Suppose that generally you have to save only 2 registers and that they are r1 and r3. Then you'll execute 9 instructions on entry and 9 instructions on exit and you're only doing 1 save better than the *worst* case of the naive caller-saves-some and callee-saves-some! Of course this all needs to be parameterized on a machine-by-machine basis, and also on an application-by-application basis. * Average number of registers to save. * Average number of saves that are wasted. * Cost to blindly do split caller/callee saves * Cost to do save-by-mask. If you really care, you should start by reading the stuff availabile via anonymous ftp from june.cs.washington.edu (128.95.1.4) in tmp/pardo/proc.call.txt ;-D on ( Lies, damn lies, and USENET ) Pardo -- pardo@cs.washington.edu {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo