Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!olivea!mintaka!bloom-beacon!eru!hagbard!sunic!mcsun!ukc!tcdcs!vax1.tcd.ie!hughesmp From: hughesmp@vax1.tcd.ie Newsgroups: comp.sys.acorn Subject: Re: 32bit immediate load in ARM code Message-ID: <1991May17.183541.1@vax1.tcd.ie> Date: 17 May 91 18:35:41 GMT References: <+|Q_L||@warwick.ac.uk> Sender: news@cs.tcd.ie Organization: Trinity College Dublin Lines: 89 Nntp-Posting-Host: vax1 In article <+|Q_L||@warwick.ac.uk>, csuwr@warwick.ac.uk (Derek Hunter) writes: > I was trying to cut down the number of labels my C compiler produces, > (having finally allowed the thing to access globals beyond the 4095 range), > and I (re)invented this: > > You can do Ldr Rn,VERY_FAR_AWAY ; with: > > Ldr Rn,[PC] > Ldr Rn,[PC,Rn] > DCD VERY_FAR_AWAY-P% ; if V_F_A preceeds this code > > Ldr Rn,[PC] > Ldr Rn,[PC,-Rn] ; (You /can/ do -Rn, can't you?) > DCD P%-VERY_FAR_AWAY ; if it doesn't Ldr - 4 cycles Ldr - 4 cycles Nop - 1 cycle -------------- 9 cycles Another problem - I'm not sure, but should some lines be... DCD V_F_A-P% ---> DCD V_F_A-P%-4 ? DCD P%-V_F_A ---> DCD P%+4-V_F_A ? As the R15 would be pointing at the instruction following the DCD? (I may be wrong, it could work; I'm not sure) It is also slow - much better is Add Rn,Pc,#(within 4096 of address) \ This is a multi-instruction add - \ several adds to make the full value Ldr Rn,[Rn,#(the error margin)] Worst case here is realistically 6 cycles, possibly 7. This can be implemented for the BASIC assembler as a FN, but it is impossible (without -pass assembly that may never terminate) for the assembler to work out the optimum number of Adds because it may not know the destination on Pass 1, but it must take up the instruction space then... Thus the FN must be implemented as: FNldr(n,a,o) - LDR Rn,a taking up an additional o instructions. > 32 bit immediate constants can be read with > > Ldr Rn,[PC] > Bic Rn,Rn,# (( number >> 28 ) EOR 15) << 28 > DCD number OR &F0000000 Ldr - 4 cycles Bic - 1 cycle Nop - 1 cycle -------------- 6 cycles It is faster to do... MOV Rn,#x AND &FF ORR Rn,Rn,#x AND &FF00 ORR Rn,Rn,#x AND &FF0000 ORR Rn,Rn,#x AND &FF000000 - 4 cycles, and there won't be the problems you speculate on, in possible future CPUs... Again with this, you can optimise it further if you know certain bits of your number will be 0; it might be faster to do... MOV Rn,#x AND &F00000F:ORR Rn,Rn,#x AND &FF00 \ 2 cycles depending on your numbers; write a FN that will work out the most optimum code, if you are using the BASIC assembler - then you can just say FNmov(n,x) and it will do the fastest possible implementation... Such a FN would be fairly trivial to implement. Incidentally, we have a BASIC library which implements all these FNs, including FNadr (same limitations as FNldr apply), FNsp (assign space ; equivalent to ]P%+=sp:[OPT pass), FNfi (assigns space to load in the given file, and loads the file in on pass 2) , and a fair few others methinks... We'll post them to c.s.a. if there would be any interest... One limitation, because there is no standard 'pass' instruction, we assume it is called 'pass' - this is easily changed (Although really, you should follow our example ;-) - a function FNpass is used, which we have as... DEFFNpass=pass Just change the pass to whatever you use... Merlin, --SICK-- You suffer... But why?