Path: utzoo!utgpu!attcan!uunet!ncrlnk!ncrcae!ece-csc!mcnc!xanth!nic.MR.NET!hal!cwjcc!mailrus!tut.cis.ohio-state.edu!husc6!bbn!rochester!pt.cs.cmu.edu!sei!sei.cmu.edu!firth From: firth@sei.cmu.edu (Robert Firth) Newsgroups: comp.arch Subject: Re: HW v. SW (was RISC v. CISC --more misconceptions) Message-ID: <7629@aw.sei.cmu.edu> Date: 7 Nov 88 18:14:09 GMT References: <156@gloom.UUCP> <18931@apple.Apple.COM> <40@sopwith.UUCP> <998@l.cc.purdue.edu> <1622@scolex> <866@cernvax.UUCP> Sender: netnews@sei.cmu.edu Reply-To: firth@bd.sei.cmu.edu (Robert Firth) Organization: Carnegie-Mellon University, SEI, Pgh, Pa Lines: 38 In article <866@cernvax.UUCP> hjm@cernvax.UUCP (Hubert Matthews) writes: >The INMOS T800 has an instruction bitrevword, which turns a >little-endian word into a big-endian word, effectively doing a >reflection in the middle. Great for FFT shuffle routines. In >software, it takes quite some time. In hardware it takes just over >1 microsecond on a 30MHz part. 1 usec at 30MHz is about 30 cycles, I guess. Here's my quick and dirty attempt at the same in software, on the MIPS R2000. We assume DEST := bitrevword(SRC), where both are 4-byte variables, and T is a 256-byte lookup table of reversed bytes: la U0,SRC ; fetch address of SRC la U1,DEST ; fetch address of DEST la U2,T ; fetch address of T lb U3,3(U0) ; get byte 3 of SRC lb U4,2(U0) add U5,U3,U2 ; index lookup table lb U6,0(U5) ; get translation add U7,U4,U2 stb U6,0(U1) ; store in byte 0 of DEST lb U8,0(U7) lb U3,1(U0) stb U8,1(U1) add U5,U3,U2 lb U4,0(U0) lb U6,0(U5) add U7,U4,U2 lb U8,0(U7) stb U6,2(U1) stb U8,3(U1) (the interleaving is necessary to avoid load delays) The above is 19 cycles, in software.