Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!sdd.hp.com!elroy.jpl.nasa.gov!jarthur!uunet!mcsun!ukc!acorn!john From: john@acorn.co.uk (John Bowler) Newsgroups: comp.sys.acorn Subject: Re: 32bit immediate load in ARM code Summary: Probably not worth it. Keywords: compilers code-generation Message-ID: <7133@acorn.co.uk> Date: 17 May 91 11:52:31 GMT References: <+|Q_L||@warwick.ac.uk> Organization: Acorn Computers Ltd, Cambridge, UK Lines: 100 In article <+|Q_L||@warwick.ac.uk> csuwr@warwick.ac.uk (Derek Hunter) writes: >I was trying to cut down the number of labels my C compiler produces, > (having finally allowed the thing to access globals beyond the 4095 range), > and I (re)invented this: > >You can do Ldr Rn,VERY_FAR_AWAY ; with: > > Ldr Rn,[PC] > Ldr Rn,[PC,Rn] > DCD VERY_FAR_AWAY-P% ; if V_F_A preceeds this code > > Ldr Rn,[PC] > Ldr Rn,[PC,-Rn] ; (You /can/ do -Rn, can't you?) Yes, this is valid. > DCD P%-VERY_FAR_AWAY ; if it doesn't > > . . . and they are still relative addresses cunningly enough. > (In fact, I think they exceed the addressing space!) This is three instructions to read an given memory location with an offset of up to +/- 28 bits (4 bits set to F to give the NV condition code). How about:- ADD Rn, PC, #x LSL 12 ; PC without PSR bits, 8 bit constant ADD Rn, Rn, #y LSL 20 LDR Rn, [Rn, #x] ; 12 bit offset which gives a 28 bit (positive) PC offset, similarly using SUB will allow generation of a negative offset. This sequence of instructions has the advantage that it is faster (it avoids one LDR) on non-ARM3 machines. Also, some offsets can be represented more efficiently - in particular an arbitrary 20 bit offset only requires two instructions. Notice that *neither* approach allows link time relocation - apparently Derek's algorithm would do this, but, in practice, the compiler would not know whether the value was positive or negative, so could not generate the correct LDR instruction. (This is all because of a deficiency in the current AOF and A.OUT object module formats, which do not allow the appropriate relocation forms for the instructions which would be needed.) > >32 bit immediate constants can be read with > > Ldr Rn,[PC] > Bic Rn,Rn,# (( number >> 28 ) EOR 15) << 28 > DCD number OR &F0000000 > > or equivalent, (but Bic impresses people, because no-one knows > what it does, and those shift-28s are luvverly). Again, three instructions will generate any +/-24 bit constant (obviously), and (additionally) a very large number of the others. (Possibly even all of them, given that there are 36 immediate value bits in the three instructions, plus quite a lot of bits corresponding to the selection of different alu instruction types). >My main point of interest is this: On an ARM 3, would the DCD be read into > cache in an s cycle during the final stage of the Ldr Rn,[PC], or does it > take an n cycle all of its very own? The cache is mixed instruction + data, so the LDR causes no memory access other than that which occurs as a result of the instruction loading. > Is this nice on an ARM 3's cache? Yes. > Is it nice at all? Hum. See below. > Was this the intentional use of NV? I don't think so. We have used NV when patching binaries (to remove instructions we don't want :-)) and have recommended its use as a NOOP (after processor mode changes for example). > Has Acorn used it? Not in this way as far as I know. > Is the UndefinedNV really a problem? Currently anythingNV is ignored; for example co-processors don't see the instruction, no instruction decoding takes place (I think...) > Will this latter be supported in future releases of the hardware? NV instructions are hardly ever used. There is a lot of pressure on the instruction space; there are very few slots left in it, yet NV accounts for 1/16 of all the instructions in the ARM instruction set! It seems to me that it is very likely that future developments will use NV instructions in some way, which would cause the above to cease to work. Given that the actual advantage of the suggested code is very small (at most one extra instruction for some very rarely used 32 bit constants) it is probably worth avoiding. John Bowler (jbowler@acorn.co.uk)