Path: utzoo!news-server.csri.toronto.edu!rutgers!ub.d.umn.edu!cs.umn.edu!ux.acs.umn.edu!vx.acs.umn.edu!dhoyt From: dhoyt@vx.acs.umn.edu (DAVID HOYT) Newsgroups: comp.arch Subject: Re: RE: register save Message-ID: <3592@ux.acs.umn.edu> Date: 11 Mar 91 21:33:37 GMT References: <1991Mar11.192116.1974@dgbt.doc.ca> Sender: news@ux.acs.umn.edu Reply-To: dhoyt@vx.acs.umn.edu Organization: University of Minnesota, Academic Computing Services Lines: 36 News-Software: VAX/VMS VNEWS 1.3-4 In article <1991Mar11.192116.1974@dgbt.doc.ca>, don@dgbt.doc.ca (Donald McLachlan) writes... > The only way I can think to generalise this would be to always >put the return address in a dedicated register. This would require that >the "call" would first push the old contents onto the stack and then >load in the new return address. The matching return would use in the >dedicated register as the return address. The function making the call >would then be responsible for grabbing the old return address off the >stack and loading it into the dedicated register. Imagine if you have say 32 registers. On procedure call r7 is the return address and r0..r6 are the first seven parameters. Now if our code looks like this save r7 | sub:: for( i = 0; i < 10000000; i++ ) | ld r0, r1 ; add 1, r1; stor r0, r1 sub( a + i ) | jmp r7 restore r7 We've done 10,000,001 reads and the same number of writes. Now if we had a vax jsr/rsb instruction (one write, one read per call) We would have 20m reads and writes. Doubling the memory accesses. Obviously even with a good cache, we'd much rather do our subroutine calls the first way, rather than the vax way. And the normal CallS instruction that the vax uses for subroutine calls is much more expensive than the jsr/rsb calls. We could greatly speed up subroutine calls on a vax by using 4/16 registers this way, or even 8/16 I suspect. In my example sub() is just a normal subroutine, so we don't need a smart linker or code inliner. In fact the only penalty with the all register procedure call is that we have an additional two (unconditional) branches. With a small direct map instruction buffer (ala Cray) and branch intelligent pipeline, the execution time cost of the two branches would be close to nothing. Basically giving you inline virtual functions, that we wouldn't even have to declare inline, everybody wins, wee ha! david | dhoyt@vx.acs.umn.edu | dhoyt@umnacvx.bitnet