Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!chinacat!uudell!loft386!dpi From: dpi@loft386.uucp (Doug Ingraham) Newsgroups: comp.bugs.sys5 Subject: Re: crufty memcmp on 386 Summary: Corrected 386 version enclosed. Message-ID: <1990Nov18.210755.9978@loft386.uucp> Date: 18 Nov 90 21:07:55 GMT References: <1990Nov12.173812.3227@cbnewsh.att.com> <1990Nov15.163550.27015@cbnewsh.att.com> Distribution: na Organization: Lofty Pursuits Public Access Unix for Rapid City, SD USA Lines: 134 In article <1990Nov15.163550.27015@cbnewsh.att.com>, gls@cbnewsh.att.com (Col. G. L. Sicherman) writes: > In <1990Nov13.182133.27748@loft386.uucp>, dpi@loft386.uucp writes: > > > > I can't make the original fail. Could you provide a short code segment > > that will display the claimed behavior? If you can make AT&T's fail, > > I would like to know if mine fails also. > > Sure. Here are my test program and what came out of it: [Test program deleted] I coded my routine from a Draft of the standard. The version provided with SYS V 386 performs a signed test which was acceptable in pre standard times to most. I have since updated my routine and it now does unsigned comparisons. Col Sicherman, I wrote my test routine to test for the signed version and wrote my original to match the version supplied by AT&T. This one passes your test and my more complete test. Feel free to use it in anyway you wish for any purpose. It is as bug free as I can make it. If anyone finds a problem or an improvement please let me know. ------------------------------------------------------------------------- .file "memcmp.s" / The Intel 80386 implementation of the ANSI memcmp() function. / Written April 8, 1990 by Doug Ingraham. dpi@loft386.UUCP / Copyright 1990 by Douglas P. Ingraham. This code is freely / distributable so long as this notice remains intact. / / Corrected November 18, 1990 to compare unsigned as per the standard. / .text .align 4 .globl memcmp memcmp: / Save edi and esi in edx and eax to save 4 clocks over popping them movl %edi,%edx / 2 clocks 2 bytes movl %esi,%eax / 2 clocks 2 bytes / esi is pointer to the first string movl 0x4(%esp),%esi / 2 clocks 4 bytes / edi is pointer to the second string movl 0x8(%esp),%edi / 2 clocks 4 bytes / This tests to see if the pointers are to the same string. / This test is probably a waste of time. cmpl %esi,%edi / 2 clocks 2 bytes je same / 7+m,3 clocks 2 bytes / The ecx gets the count. If Zero then by definition they are the same. movl 0xC(%esp),%ecx / 2 clocks 4 bytes jcxz same / 9+m,5 clocks 2 bytes / Test the count to make sure there are enough bytes to bother with this / speed optimization. 12 was chosen as a ballpark figure. Must be greater / than 7 as a minimum. cmpl $12,%ecx / 2 clocks 3 bytes jl byte / 7+m,3 clocks 2 bytes / Test to see if either pointer is word aligned. Most is 3 cmpsb's to align / so the loop is unrolled. 75 clocks worst case. testl $3,%esi / 2 clocks 6 bytes jz is_aligned / 7+m,3 clocks 2 bytes testl $3,%edi / 2 clocks 6 bytes jz is_aligned / 7+m,3 clocks 2 bytes cmpsb / 10 clocks 1 byte jne different / 7+m,3 clocks 2 bytes decl %ecx / 2 clocks 1 byte testl $3,%esi / 2 clocks 6 bytes jz is_aligned / 7+m,3 clocks 2 bytes testl $3,%edi / 2 clocks 6 bytes jz is_aligned / 7+m,3 clocks 2 bytes cmpsb / 10 clocks 1 byte jne different / 7+m,3 clocks 2 bytes decl %ecx / 2 clocks 1 byte testl $3,%esi / 2 clocks 6 bytes jz is_aligned / 7+m,3 clocks 2 bytes testl $3,%edi / 2 clocks 6 bytes jz is_aligned / 7+m,3 clocks 2 bytes cmpsb / 10 clocks 1 byte jne different / 7+m,3 clocks 2 bytes decl %ecx / 2 clocks 1 byte / Save the modified count back on the stack for possible later use. is_aligned: movl %ecx,0xC(%esp) / 2 clocks 4 bytes / Divide by 4 so that we can do word compares shrl $2,%ecx / 3 clocks 3 bytes / Do the long level search. repz; cmpsl / 5+9*n clocks 2 bytes je byte_equal / 7+m,3 clocks 2 bytes / backup 4 bytes movl $4,%ecx / 2 clocks 5 bytes subl %ecx,%esi / 2 clocks 2 bytes subl %ecx,%edi / 2 clocks 2 bytes / Do the byte level search. byte: repz; cmpsb / 5+9*n clocks 2 bytes je same / 7+m/3 clocks 2 bytes / Restore the esi so that eax is available for return value. different: movl %eax,%esi / 2 clocks 2 byte / If the carry is set, return -1 jc less_than / 7+m/3 clocks 2 bytes greater_than: mov $1,%eax / 2 clocks 5 bytes / Restore the edi. movl %edx,%edi / 2 clocks 2 bytes ret / 10+m clocks 1 byte / If the string compared out then come here. same: movl %eax,%esi / 2 clocks 2 bytes short_same: movl %edx,%edi / 2 clocks 2 bytes xorl %eax,%eax / 2 clocks 2 bytes ret / 10+m clocks 1 byte / come here if the high speed test was equal byte_equal: movl 0xC(%esp),%ecx / 2 clocks 4 bytes andl $3,%ecx / 2 clocks 6 bytes repz; cmpsb / 5+9*n clocks 2 bytes / Restore the esi so that eax is available for return value. movl %eax,%esi / 2 clocks 2 byte je short_same / 7+m/3 clocks 2 bytes jnc greater_than / 7+m/3 clocks 2 bytes less_than: mov $-1,%eax / 2 clocks 5 bytes / Restore the edi. movl %edx,%edi / 2 clocks 2 bytes ret / 10+m clocks 1 byte ------------------------------------------------------------------------ > -- > G. L. Sicherman > gls@odyssey.att.COM Thanks for finding this. -- Doug Ingraham (SysAdmin) Lofty Pursuits (Public Access for Rapid City SD USA) bigtex!loft386!dpi uunet!loft386!dpi