Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!uflorida!gatech!udel!rochester!pt.cs.cmu.edu!andrew.cmu.edu!zs01+ From: zs01+@andrew.cmu.edu (Zalman Stern) Newsgroups: comp.arch Subject: Re: Endian reversing MOVEs Message-ID: Date: 7 Feb 89 21:51:55 GMT References: <759@atanasoff.cs.iastate.edu> , <772@atanasoff.cs.iastate.edu> Organization: Information Technology Center, Carnegie Mellon, Pittsburgh, PA Lines: 83 In-Reply-To: <772@atanasoff.cs.iastate.edu> > *Excerpts from ext.nn.comp.arch: 7-Feb-89 Re: Endian reversing MOVEs Joe* > *Keane@andrew.cmu.edu (713)* > That's 11 memory references for something which has nothing to do with memory. > It only takes 6 instructions (0 memory references) on the RT. > *Excerpts from ext.nn.comp.arch: 7-Feb-89 Re: Endian reversing MOVEs John* > *Hascall@atanasoff.c (636)* > BTW, I could do it in less than 11, if I assumed somethings about > the addressing mode of the operands and/or the availability of > a register to work in, but for my situation I neeeded a completely > general method. > John Hascall The RT has two advantages here. First, has no addressing modes to worry about. Second, the RT has instructions for moving characters around within registers. The first is a result of a RISC load/store architecture. The second might not exactly be RISC, but probably isn't too far off. (I don't have any figures as to how often the MC03 type instructions are used but they shouldn't add much complexity.) For comparison purposes I've jotted down code for the RT, the AMD 29000, and the MIPS R3000. These all assume you are byte switching a value from one register into a distinct register. This is a minor difference from the original macro, but I think it is the reasonable case to consider. Here's the RT code: mc31 SRC, DEST ; DEST[3] = SRC[1] mc23 DEST, DEST ; DEST[2] = DEST[3] mc32 SRC, DEST ; DEST[3] = SRC[2] mc13 DEST, DEST ; DEST[1] = SRC[3] mc30 SRC, DEST ; DEST[3] = SRC[0] mc03 SRC, DEST ; DEST[0] = SRC[3] 6 instructions, no temporary registers or other state modified. The AMD 29000 has byte insertion/extraction functions. In the following, setbp sets a special "byte pointer" register. The exbyte instruction takes the byte of the source addressed by the byte pointer and places it in the low order byte of the destination. (I use constants for the setbp instruction since the actual values depend on the byte order bit in the processor configuration register...) The code looks like so: and DEST, SRC, 0xff ; DEST = SRC & oxff sll DEST, DEST, 8 ; DEST =<< 8 setbp LOWMIDBYTE ; Setup special register with constant exbyte SRC, DEST, DEST ; DEST[0] = SRC[1] essentially sll DEST, DEST, 8 ; DEST =<< 8 setbp LOWHIGHBYTE ; Setup special register with constant exbyte SRC, DEST, DEST ; DEST[0] = SRC[2] sll DEST, DEST, 8 ; DEST =<< 8 setbp HIGHBYTE ; Setup special register with constant exbyte SRC, DEST, DEST ; DEST[0] = SRC[3] 10 single cycle instructions, and no temporary register but the byte pointer is modified. Finally, the R3000 routine, which is fairly generic and could be implemented on other architectures: andi temp, SRC, 0x00ff sll DEST, temp, 24 andi temp, SRC, 0xff00 sll temp, temp, 8 and DEST, temp, DEST lui temp, 0x00ff and temp, SRC, temp srl temp, temp, 8 and DEST, temp, DEST lui temp, 0xff00 and temp, SRC, temp srl temp, temp, 24 and temp, DEST, DEST 13 single cycle instructions, one temporary register. As I don't regularly program in asembly language, much less on all three of the above mentioned processors, I don't make any promises about the correctness or optimality of the above code. It is presented for rough comparison only. I'm sure someone will correct me if I got it wrong :-) Sincerely, Zalman Stern Internet: zs01+@andrew.cmu.edu Usenet: I'm soooo confused... Information Technology Center, Carnegie Mellon, Pittsburgh, PA 15213-3890