Path: utzoo!utgpu!watserv1!watmath!att!occrsh!uokmax!munnari.oz.au!samsung!cs.utexas.edu!swrinde!mips!daver!bungi.com!news From: dlr@daver.bungi.com (Dave Rand) Newsgroups: comp.sys.nsc.32k Subject: Re: Dhrystone 2.1 Message-ID: Date: 12 Sep 90 08:40:56 GMT Sender: news@daver.bungi.com Lines: 215 Approved: news@daver.bungi.com [In the message entitled "Dhrystone 2.1" on Sep 11, 23:35, John Connin writes:] > > Now, for my question. Is the difference between 8771 that I am getting > and the reported 11000 figure do only to compiler efficiency, or > are the other factors entering the picture. My naive assumption > is there must be other factors since the code generated by GCC > looks damm good to me. > The gcc code is not bad, but is not outstanding. The code from the National Semiconductor CTP compiler beats it, as does the Green Hills compilers. You were correct in looking at strcmp/strcpy, but you didn't go far enough at optimizing them... there are some _real_ hairy things you can do to get dhrystone numbers way up there. While at National, I had a "challange" to beat 20,000 dhrystones/sec (1.1 version). I was happy when I hit 19,400. I got tired of optimizing when I hit 30,000 :-) Here is the gist of what you need to look at. I wrote these routines at home, from a neat article in a C programming journal I subsribe to (I can find the original reference, if you are interested). The magic here is to look at a complete double-word at a time, storing it if none of the bytes are zero. I tink this worked out to 5 clocks per byte, which was as good as I could get it. The assembler format is System V. Perhaps someone else can make it a bit better? Central to the routine is the concept of treating a 32 bit register as 4 byte values, with "borrows" between them. Subtracting 1 from each of the byte values (0x01010101) will change a 0x00 to 0xff. The original byte value is then masked off with a BIC instruction (see - I told you CISC's are good for something :-), and then the implied borrow is tested for with a simple AND instruction. If the result is non-zero, at least one of the bytes must have been zero, and we exit the loop. Exitting the loop is interesting code, too... I do a psuedo binary search to find the zero byte. I haven't tested this code real well, so do let me know if you find bugs. Neat stuff, all considered! .file "mstrcpy.s" # # Very fast replacement for regular C string copy / strcmp routines # # Dave Rand # 09/01/88 # # .globl _strcpy,_strcmp .align 4 _strcpy: movd 8(sp),r1 # get source pointer movd 4(sp),r2 # get destination pointer movd r5,tos movd 0(r1),r5 # get source data movd r5,r0 # save it for storage subd $0x01010101,r0 # subtract magic number 1 bicd r5,r0 # clear off original bits andd $0x80808080,r0 # test for borrow cmpqd 0,r0 # is it zero? blo mvby1:b # if any byte was zero, ex .align 4 lp: movd 4(r1),r0 # get source data movd r5,0(r2) # save double in dest. addqd 4,r1 # increment source pointer addqd 4,r2 # increment destination pointer movd r0,r5 # save it for storage subd $0x01010101,r0 # subtract magic number 1 bicd r5,r0 # clear off original bits andd $0x80808080,r0 # test for borrow cmpqd 0,r0 # is it zero? beq lp:b # no, loop # .align 4 # r5 contains the data, at least one byte is zero # r0 contains the bit mask of the byte that contains the zero mvby1: cmpqw 0,r0 # was it in the first two bytes? blo mv1a:b # yes, exit now cmpd $0x80000000,r0 # Check the last two bytes bls mv4:b # it was byte 2 movw r5,0(r2) # save the word movb 2(r1),2(r2) # copy the null bytes movd tos,r5 # restore register movd 4(sp),r0 # return dest. ptr. ret $(0) .align 4 mv4: movd r5,0(r2) # no, it was the last byte movd tos,r5 # restore register movd 4(sp),r0 # return dest. ptr. ret $(0) .align 4 mv1a: cmpb $0x80,r0 # was it byte zero? beq mv1:b # yes, exit mv2: movw r5,0(r2) # save the word movd tos,r5 # restore register movd 4(sp),r0 # return dest. ptr. ret $(0) .align 4 mv1: movb r5,0(r2) # save the byte movd tos,r5 # restore register movd 4(sp),r0 # return dest. ptr. ret $(0) .align 4 _strcmp: movd 8(sp),r1 # get source pointer movd 4(sp),r2 # get destination pointer movd r5,tos movd 0(r1),r5 # get source data movd r5,r0 # save it for storage subd $0x01010101,r0 # subtract magic number 1 bicd r5,r0 # clear off original bits andd $0x80808080,r0 # test for borrow cmpqd 0,r0 # is it zero? blo cpxit:w # if any byte was zero, ex .align 4 cplp: movd 4(r1),r0 # get source data cmpd r5,0(r2) # compare the two bne cpxit1a:b # exit if not equal addqd 4,r1 # increment source pointer addqd 4,r2 # increment destination pointer movd r0,r5 # save it for storage subd $0x01010101,r0 # subtract magic number 1 bicd r5,r0 # clear off original bits andd $0x80808080,r0 # test for borrow cmpqd 0,r0 # is it zero? beq cplp:b # no, loop br cpxit:w .align 4 cpxit1a: # the 4 current bytes don't match. Find out why. # There is no zero byte in the current 4 bytes of source cpxit1: cmpw r5,0(r2) # is it the first word? bne cpx3a1:b # yes, exit now cmpb 2(r1),2(r2) # next? bne cpx3c:b # yes... movb 3(r2),r0 # get s2 subb 3(r1),r0 # subtract s1 movxbd r0,r0 # sign extended return value movd tos,r5 # pop saved register ret $(0) cpx3a1: cmpb r5,0(r2) # is it the first byte? bne cpx3a:b # yes, exit now movb 1(r2),r0 # get destination subb 1(r1),r0 # subtract source movxbd r0,r0 # sign extend return value movd tos,r5 # pop saved register ret $0 .align 4 cpx3a: movb 0(r2),r0 # get destination subb r5,r0 # subtract source movxbd r0,r0 # sign extend return value movd tos,r5 # pop saved register ret $0 .align 4 cpx3c: movb 2(r2),r0 # get destination subb 2(r1),r0 # subtract source movxbd r0,r0 # sign extend return value movd tos,r5 # pop saved register ret $0 .align 4 # 1 of the 4 current bytes is zero. # check to see what it means cpxit: cmpqb 0,r5 # is lsb zero? beq cpx2:w # exit now if it is movb 0(r2),r0 # get s2 cmpb r0,r5 # does it match? bne cpx1:b # no, exit now movb 1(r1),r5 # get s1 cmpqb 0,r5 # is lsb zero? beq cpx2:b # exit now if it is movb 1(r2),r0 # get s2 cmpb r0,r5 # does it match? bne cpx1:b # no, exit now movb 2(r1),r5 # get s1 cmpqb 0,r5 # is lsb zero? beq cpx2:b # exit now if it is movb 2(r2),r0 # get s2 cmpb r5,r0 # does it match? bne cpx1:b # no, exit now movb 3(r1),r5 # get s1 cmpqb 0,r5 # is s1 zero? beq cpx2:b # exit now if it is movb 3(r2),r0 # get s2 .align 4 cpx1: subb r5,r0 # subtract to get diff movxbd r0,r0 # sign extended return value movd tos,r5 # pop saved register ret $(0) .align 4 cpx2: movqd 0,r0 # strings are equal movd tos,r5 # pop saved register ret $0 # and return -- Dave Rand {pyramid|mips|bct|vsi1}!daver!dlr Internet: dlr@daver.bungi.com