Path: utzoo!news-server.csri.toronto.edu!rutgers!ub.d.umn.edu!cs.umn.edu!ux.acs.umn.edu!vx.acs.umn.edu!dhoyt
From: dhoyt@vx.acs.umn.edu (DAVID HOYT)
Newsgroups: comp.arch
Subject: Re: RE: register save
Message-ID: <3592@ux.acs.umn.edu>
Date: 11 Mar 91 21:33:37 GMT
References: <1991Mar11.192116.1974@dgbt.doc.ca>
Sender: news@ux.acs.umn.edu
Reply-To: dhoyt@vx.acs.umn.edu
Organization: University of Minnesota, Academic Computing Services
Lines: 36
News-Software: VAX/VMS VNEWS 1.3-4

In article <1991Mar11.192116.1974@dgbt.doc.ca>, don@dgbt.doc.ca (Donald McLachlan) writes...
>	The only way I can think to generalise this would be to always
>put the return address in a dedicated register. This would require that
>the "call" would first push the old contents onto the stack and then
>load in the new return address. The matching return would use in the
>dedicated register as the return address. The function making the call
>would then be responsible for grabbing the old return address off the
>stack and loading it into the dedicated register.

  Imagine if you have say 32 registers.  On procedure call r7 is the return
address and r0..r6 are the first seven parameters.  Now if our code looks
like this

    save r7                            | sub::
    for( i = 0; i < 10000000; i++ )    |    ld r0, r1 ; add 1, r1; stor r0, r1
        sub( a + i )                   |    jmp r7
    restore r7

  We've done 10,000,001 reads and the same number of writes.  Now if we had
a  vax jsr/rsb instruction (one write, one read per call)  We would have 20m
reads and writes.  Doubling the memory accesses. Obviously even with a good
cache, we'd much rather do our subroutine calls the first way, rather than
the vax way.  And the normal CallS instruction that the vax uses for
subroutine calls is much more expensive than the jsr/rsb calls.  We could
greatly speed up subroutine calls on a vax by using 4/16 registers this way,
or even 8/16 I suspect.

  In my example sub() is just a normal subroutine, so we don't need a smart
linker or code inliner.  In fact the only penalty with the all register
procedure call is that we have an additional two (unconditional) branches.
With a small direct map instruction buffer (ala Cray) and branch intelligent
pipeline, the execution time cost of the two branches would be close to
nothing.  Basically giving you inline virtual functions, that we wouldn't
even have to declare inline, everybody wins, wee ha!

david | dhoyt@vx.acs.umn.edu | dhoyt@umnacvx.bitnet