Path: utzoo!attcan!uunet!microsoft!w-colinp From: w-colinp@microsoft.UUCP (Colin Plumb) Newsgroups: comp.arch Subject: Re: i860 Dhrystones Message-ID: <12372@microsoft.UUCP> Date: 19 Mar 89 01:26:04 GMT References: <654@cimcor.mn.org> <93088@sun.uucp> <701@pcrat.UUCP> <93452@sun.uucp> <15074@winchester.mips.COM> <210@intelca.intel.com> <15226@winchester.mips.COM> Reply-To: w-colinp@microsoft.uucp (Colin Plumb) Organization: very little Lines: 43 mash@mips.COM (John Mashey) wrote: > 2) OK, I give up. There must be something unbelievably clever going on > to use 128-bit loads for C-language string operations. I've looked > at the i860 Programmer's Reference Manual a bunch, trying to figure > out how to use either the FP unit or the graphics unit to do this. Yeah... the Z-buffer check instructions could be used for this, but they're only available in 16 and 32-bit versions, and you have to test the bits from the psr, two cycles. And even that would only be 64 bits at a time. > The string copy on page 9-5 of the manual is the "natural" strcpy > (which doesn't use anything but byte load/store, and takes about 5 cycles/ > byte). I haven't been able to find anything like "branch on any byte zero", > and the 860 doesn't have unaligned word operations. For a fair test, > you MUST use str* that only assume byte alignment of operands, and > you can't inline the str*. The only place I can think of using 128-bit > loads is in the structure-copy, and it shouldn't be used there, > unless structures whose largest entities are words are always aligned > to 4-word boundaries, which seems unlikely. Well, you can quickly cobble together some code using the ((x-0x01010101)&~x)&0x80808080 != 0 trick that works on words at a time. This would help in Dhrystone which, as has been observeed, has unnaturally long strings. If you get this going with a bit of alternation to allow for load latency, you can get strlen down to about 5 cycles/word. Strcmp and strcpy would be slower, but would probably be bandwidth-limited. As for structure copies, what you want is for all structures 4 words or larger in size to always be 4-word aligned. Intel suggests the stack is kept this way, for just the same reason. (I admit this is starting to enter the realm of declining returns - you can waste a lot of memory this way - but is still feasable.) > Maybe somebody at Intel would care to post the str* routines > and educate us? I posted the instruction set - it's an exercise for the reader. :-) -- -Colin (uunet!microsoft!w-colinp) "Don't listen to me. I never do." - The Doctor