Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!decwrl!amdcad!rpw3 From: rpw3@amdcad.AMD.COM (Rob Warnock) Newsgroups: comp.arch Subject: Re: quest for breakthroughs (long) Message-ID: <24579@amdcad.AMD.COM> Date: 23 Feb 89 12:17:58 GMT References: <740@tetons.UUCP> <76700068@p.cs.uiuc.edu> <671@oracle.oracle.com> Reply-To: rpw3@amdcad.UUCP (Rob Warnock) Organization: [Consultant] San Mateo, CA Lines: 51 In article <671@oracle.oracle.com> csimmons@oracle.UUCP writes: +--------------- | (One of my favorite puzzles: You have two 32-bit registers | containing some pattern of bits. The most significant bit of a register | is numbered "0". You want to move bits 0, 2, 4, and 16 from the | first register, and bit 0 of the second register into some subset | of the low-order eight bits of some register. The high-order 24 bits | of the destination register must end up as zero. The three bits in | the low-order byte of the destination register which are not copies | of the five bits of interest may have any value at all.) +--------------- O.k., I'll bite. (Byte? Nybble? Chomp at the bit? ;-} ;-} ) On the Am29000, you can this in five instructions of straight-line code, involving a non-obvious use of the 29k "extract" instruction (which extracts a 32-bit field from a 64-bit source: any two of the 32-bit regs). Let "x" be the first source reg, "y" be the second, "t1" & "t2" temp regs, and "z" the result. (Depending on where the result goes and whether either/both of the source regs may be destroyed, one or both of the temps may be uneeded.) The code is: srl t1,x,27 ; t1<27:31> = x<0:4>, t1<0:26> = 0 sll t2,x,16 ; t2<0> = x<16> (t2<1:15> = "don't care") mtsrim FC,1 ; condition Funnel-shifter Count extract t1,t1,t2 ; t1<0:31> = t1<1:31> cat t2<0> extract z,t1,y ; z<0:31> = t1<1:31> cat y<0> The result (BigEndian bit numbers) satisfies the given conditions (bits 24, 26, & 28 are the "don't care" bits): 0:23 24 25 26 27 28 29 30 31 +- - - -+-------+-------+-------+-------+-------+-------+-------+-------+ | ...0 | 0 | x<0> | x<1> | x<2> | x<3> | x<4> | x<16> | y<0> | +- - - -+-------+-------+-------+-------+-------+-------+-------+-------+ "Extract" is also handy in 29k code when shifting multi-word quantities. A highly optimized version of "memcpy()" does non-aligned byte copies with inner loops of "load_multiple, extract, extract..., store_multiple". (Without using "extract", the best I could do was seven instructions, which happened to give the same result pattern.) Rob Warnock Systems Architecture Consultant UUCP: {amdcad,fortune,sun}!redwood!rpw3 ATTmail: !rpw3 DDD: (415)572-2607 USPS: 627 26th Ave, San Mateo, CA 94403