Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ulowell!masscomp!hanko
From: hanko@masscomp.UUCP (Jim Hanko)
Newsgroups: comp.arch
Subject: Re: i860 Dhrystones
Keywords: i860 N10 Floating Point Dhrystones
Message-ID: <955@masscomp.UUCP>
Date: 16 Mar 89 16:56:11 GMT
References: <654@cimcor.mn.org> <93088@sun.uucp> <701@pcrat.UUCP> <93452@sun.uucp> <15074@winchester.mips.COM> <210@intelca.intel.com> <15226@winchester.mips.COM>
Reply-To: hanko@masscomp.UUCP (Jim Hanko)
Organization: Concurrent Computer Corp. - Westford, Ma
Lines: 47

In article <15226@winchester.mips.COM> mash@mips.COM (John Mashey) writes:
>In article <210@intelca.intel.com> clif@intelca.intel.com (Ken Shoemaker) writes:
>...
>>The i860 CPU benchmark report had a TYPO the Dhrystone benchmark used
>>the Greenhill C compiler not FORTRAN.
>>My speculation (note the word speculation) as to why the the Dhrystone 
>>numbers are so good is:  ...
>>
>>	128-bit loads for string instructions
>
>
>2) OK, I give up.  There must be something unbelievably clever going on
>to use 128-bit loads for C-language string operations. ...
>...  For a fair test, you MUST
            ^^^^^^^^^
>use str* that only assume byte alignment of operands, and
>you can't inline the str*.  ...
>
>3) Anyway, various people at various companies still can't figure
>out why the number can reasonably be this high, under the
>normal rules, UNLESS there's some really slick trick for
>getting strcpy and strcmp down around 2 cycles/byte.

A couple of years ago I investigated the output of the Green Hills C compiler
on the Dhrystone benchmark (for a different architecture).  I remember being
somewhat surprised to see that the compiler had inlined the strcpy calls.  It
could do this since most of the calls were of the form: 
	strcpy(x, "a constant string");

I believe that it did not actually copy the bytes from memory but loaded long
immediate values and stored them.  

Although strcpy is extensively called with string constants in
Dhrystone, this is relatively rare in real programs.  Therefore, such a
compiler feature seems to be targeted specifically to Dhrystone. 

I can't say that the Intel version of the compiler has this "optimization"
(or if it did that Intel knew about it), but this may explain the high
numbers.  Can anyone with access to the compiler check this?

I think it would clearly be unfair to compare Dhrystone numbers where this
trick was used to those where a strcpy subroutine was called. 

-
#include <std/disclaimer>

Jim Hanko		{uunet|decvax|harvard|mit-eddie}!masscomp!hanko