Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!brutus.cs.uiuc.edu!jarthur!elroy.jpl.nasa.gov!ames!pacbell!osc!jgk
From: jgk@osc.COM (Joe Keane)
Newsgroups: comp.arch
Subject: Re: 64-bit addresses
Message-ID: <2068@osc.COM>
Date: 23 Feb 90 12:16:42 GMT
References: <9708@spool.cs.wisc.edu> <20270@cfctech.cfc.com> <11112@encore.Encore.COM> <10795@snow-white.udel.EDU> <2027@osc.COM> <162@gollum.twg.com> <2054@osc.COM> <6190@bd.sei.cmu.edu>
Reply-To: jgk@osc.osc.COM (Joe Keane)
Organization: Object Sciences Corp., Menlo Park, CA
Lines: 40

In article <2054@osc.COM> i write:

>I agree this is hard, but it's an interesting optimization and can only
>improve your performance.  Of course it's completely impossible if your
>constants are embedded in the instruction stream.

I'm not sure what the antecedent of `this' is in my post, but what i meant to
be referring to is replacing multiple instances of the same constant with
multiple references to one instance.  This is what i claim can only help.

The first step, though, is to take the constants out of line.  I'm don't think
this usually helps; the best i hope for is that it doesn't hurt.

In article <6190@bd.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes:
>Sorry, I don't see that.  Since the average constant is smaller than
>the average address, taking constants out of line and pooling them
>seems to me a guaranteed pessimisation

The relevant size to compare is that of the offset from the base register, not
the effective address.  In particular, on many RISC architectures, you get an
8-bit or so offset free with your load instruction.

>(a) you don't save bits in the instruction, and may need more

True enough, if you don't count the immediate data as part of its instruction.

>(b) the extra indirection is one more memory reference, which
>    is pure overhead

This is not true.  You either fetch the constant from the constant pool or the
instruction stream.  If the instruction sizes are the same, the number of
fetches is the same in either case.

>(c) you have reduced locality by adding a gratuitous reference to
>    another part of the address space

I've replaced some number of fetches from the instruction stream with the same
number of fetches from the constant pool.  Whether this helps or hurts depends
on a bunch of things about your caches.  In the best case it can make a loop
fit completely in the I-cache which didn't before.