Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!olivea!mintaka!bloom-beacon!eru!hagbard!sunic!mcsun!ukc!tcdcs!vax1.tcd.ie!hughesmp
From: hughesmp@vax1.tcd.ie
Newsgroups: comp.sys.acorn
Subject: Re: 32bit immediate load in ARM code
Message-ID: <1991May17.183541.1@vax1.tcd.ie>
Date: 17 May 91 18:35:41 GMT
References: <+|Q_L||@warwick.ac.uk>
Sender: news@cs.tcd.ie
Organization: Trinity College Dublin
Lines: 89
Nntp-Posting-Host: vax1

In article <+|Q_L||@warwick.ac.uk>, csuwr@warwick.ac.uk (Derek Hunter) writes:
> I was trying to cut down the number of labels my C compiler produces,
>  (having finally allowed the thing to access globals beyond the 4095 range),
>  and I (re)invented this:
> 
> You can do 	Ldr Rn,VERY_FAR_AWAY	; with:
> 
> 		Ldr Rn,[PC]
> 		Ldr Rn,[PC,Rn]
> 		DCD VERY_FAR_AWAY-P%	; if V_F_A preceeds this code
> 
> 		Ldr Rn,[PC]
> 		Ldr Rn,[PC,-Rn]		;   (You /can/ do -Rn, can't you?)
> 		DCD P%-VERY_FAR_AWAY    ; if it doesn't

Ldr - 4 cycles
Ldr - 4 cycles
Nop - 1 cycle
--------------
      9 cycles

Another problem - I'm not sure, but should some lines be...
DCD V_F_A-P%   ---> DCD V_F_A-P%-4 ?
DCD P%-V_F_A   ---> DCD P%+4-V_F_A ?
As the R15 would be pointing at the instruction following the DCD?
(I may be wrong, it could work; I'm not sure)

It is also slow - much better is

Add Rn,Pc,#(within 4096 of address) \ This is a multi-instruction add -
                                    \ several adds to make the full value
Ldr Rn,[Rn,#(the error margin)]

Worst case here is realistically 6 cycles, possibly 7.

This can be implemented for the BASIC assembler as a FN, but it
is impossible (without <n>-pass assembly that may never terminate)
for the assembler to work out the optimum number of Adds because it
may not know the destination on Pass 1, but it must take up the
instruction space then... Thus the FN must be implemented as:

FNldr(n,a,o) - LDR Rn,a taking up an additional o instructions.

> 32 bit immediate constants can be read with
> 
> 		Ldr Rn,[PC]
> 		Bic Rn,Rn,# (( number >> 28 ) EOR 15) << 28
> 		DCD number OR &F0000000

Ldr - 4 cycles
Bic - 1 cycle
Nop - 1 cycle
--------------
      6 cycles

It is faster to do...

MOV Rn,#x AND &FF
ORR Rn,Rn,#x AND &FF00
ORR Rn,Rn,#x AND &FF0000
ORR Rn,Rn,#x AND &FF000000

- 4 cycles, and there won't be the problems you speculate on, in
possible future CPUs... Again with this, you can optimise it
further if you know certain bits of your number will be 0; it
might be faster to do...
MOV Rn,#x AND &F00000F:ORR Rn,Rn,#x AND &FF00 \ 2 cycles
depending on your numbers; write a FN that will work out the most
optimum code, if you are using the BASIC assembler - then you can
just say
FNmov(n,x)
and it will do the fastest possible implementation...
Such a FN would be fairly trivial to implement.

Incidentally, we have a BASIC library which implements all these
FNs, including FNadr (same limitations as FNldr apply), FNsp
(assign space ; equivalent to ]P%+=sp:[OPT pass), FNfi (assigns
space to load in the given file, and loads the file in on pass 2)
, and a fair few others methinks... We'll post them to c.s.a. if
there would be any interest... One limitation, because there is
no standard 'pass' instruction, we assume it is called 'pass' -
this is easily changed (Although really, you should follow our
example ;-) - a function FNpass is used, which we have as...
DEFFNpass=pass
Just change the pass to whatever you use...

Merlin,
--SICK--
You suffer... But why?