Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!brutus.cs.uiuc.edu!jarthur!elroy.jpl.nasa.gov!ames!pacbell!osc!jgk From: jgk@osc.COM (Joe Keane) Newsgroups: comp.arch Subject: Re: 64-bit addresses Message-ID: <2068@osc.COM> Date: 23 Feb 90 12:16:42 GMT References: <9708@spool.cs.wisc.edu> <20270@cfctech.cfc.com> <11112@encore.Encore.COM> <10795@snow-white.udel.EDU> <2027@osc.COM> <162@gollum.twg.com> <2054@osc.COM> <6190@bd.sei.cmu.edu> Reply-To: jgk@osc.osc.COM (Joe Keane) Organization: Object Sciences Corp., Menlo Park, CA Lines: 40 In article <2054@osc.COM> i write: >I agree this is hard, but it's an interesting optimization and can only >improve your performance. Of course it's completely impossible if your >constants are embedded in the instruction stream. I'm not sure what the antecedent of `this' is in my post, but what i meant to be referring to is replacing multiple instances of the same constant with multiple references to one instance. This is what i claim can only help. The first step, though, is to take the constants out of line. I'm don't think this usually helps; the best i hope for is that it doesn't hurt. In article <6190@bd.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes: >Sorry, I don't see that. Since the average constant is smaller than >the average address, taking constants out of line and pooling them >seems to me a guaranteed pessimisation The relevant size to compare is that of the offset from the base register, not the effective address. In particular, on many RISC architectures, you get an 8-bit or so offset free with your load instruction. >(a) you don't save bits in the instruction, and may need more True enough, if you don't count the immediate data as part of its instruction. >(b) the extra indirection is one more memory reference, which > is pure overhead This is not true. You either fetch the constant from the constant pool or the instruction stream. If the instruction sizes are the same, the number of fetches is the same in either case. >(c) you have reduced locality by adding a gratuitous reference to > another part of the address space I've replaced some number of fetches from the instruction stream with the same number of fetches from the constant pool. Whether this helps or hurts depends on a bunch of things about your caches. In the best case it can make a loop fit completely in the I-cache which didn't before.