Path: utzoo!attcan!uunet!husc6!bbn!rochester!pt.cs.cmu.edu!sei!sei.cmu.edu!firth From: firth@sei.cmu.edu (Robert Firth) Newsgroups: comp.arch Subject: Re: Procedure Call Protocol Message-ID: <7743@aw.sei.cmu.edu> Date: 17 Nov 88 20:07:13 GMT References: <3300037@m.cs.uiuc.edu> <5938@killer.DALLAS.TX.US> <7580@aw.sei.cmu.edu> <3926@omepd> Sender: netnews@sei.cmu.edu Reply-To: firth@bd.sei.cmu.edu (Robert Firth) Organization: Carnegie-Mellon University, SEI, Pgh, Pa Lines: 90 In article <3926@omepd> mcg@omepd (Steven McGeady) writes: >In general I ignore Mr. Firth's gratuitous slams on the 960 (why does >he have it out for just our processor? The example I normally use in "slamming" processor designs is the DEC Vax, whose makers I guess either don't read this newsgroup or are less sensitive. However, it might be appropriate to say that I think machine design is very hard, and I have great respect for the prople who do it. Moreover, many of the features I "slam" are, in my view, not the result of poor hardware engineering but of good hardware engineers who have been given bad advice by rather less good software engineers. >The 960 does a good deal *more* with its call and >return instructions that Mr. Firth has noted, not a good deal less. I don't think so. Let us take your detail > 1) the stack is adjusted for saving procedure-local registers, > the frame pointer is adjusted, and the previous value of > the frame-pointer is saved > 2) procedure-local registers are saved in the local register cache > 3) the return intruction pointer is loaded with the address of > the next instruction in the caller I agree that if you have register windows, you buy the extra work of managing them. That is a different issue. However, the protocol I posted did all of the above that was necessary, and in addition . passed a static link for access to non-locals . allocated procedure local storage . included the overhead of calling a parametric procedure or procedure variable. [I've shown in previous posts why I believe it is not necessary to have both a frame pointer and a stack front pointer, and how one can avoid having to save the caller's frame pointer.] >"Well!" I hear you cry. "What if I don't want to do all of that?" >There is an easy answer. Mr. Firth fails to mention the 'bal' (branch >and link) instruction in the 960, the exact analog of the MIPS 'jal' So I did. The alternative protocol requires a bal at the point of call and a bx at the point of return. That's absolutely all it does, at an average cost of about five cycles for the pair. Since my post claimed that a full protocol on the MIPS cost, on average, just one cycle more, I'm not quite sure whose case is strengthened by this mention. >Mr. Firth wisely neglects to count in his 12 cycle cost the cost of saving >procedure-local registers to the stack. Because it is a different issue. However, for the record: On the MIPS, if you are passed parameters in registers, and if you are not a leaf procedure, you must save the parameters to your local memory, at a cost of one cycle per register saved. On the 80960, the register windows do not overlap, so you are passed parameters in the global registers. If you are not a leaf procedure, you must save them by copying them from global to local registers, at a cost of one cycle per register saved. The cost is the same in both cases. For Mr McGready to add the cost (and an inflated estimate, too) to the MIPS case only is inappropriate. >Nevertheless, with these techniques and those >Mr. Firth outlines, you can further reduce this time, *if you have an >extremely sophisticated compiler*. Again, I believe that the issue of global optimisation is not germane. For that reason, my original post confined the optimisations to purely local ones that can be performed by a very simple codegenerator; indeed by a one-pass codegenerator, which is what does perform them. Note, by contrast, that Mr McGready's call=>bal optimisation is a non local optimisation, since it requires the caller to have information about the procedure called. For which reason, Intel were right to do the work in the linker. >So, if MIPS did just as well with no call/return, why did we implement >it? There are a number of reasons not touched above: Many thanks for that information, which I found interesting and valuable. It was my hope, in the original posting, that other readers would find of value a detailed worked example of a single topic, implemented on a single machine. If instead the result was to cause distress or anger, I apologise. Robert Firth