Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!chinacat!uudell!loft386!dpi From: dpi@loft386.uucp (Doug Ingraham) Newsgroups: comp.bugs.sys5 Subject: Re: crufty memcmp on 386 Summary: here is a better implmentation! Message-ID: <1990Nov13.182133.27748@loft386.uucp> Date: 13 Nov 90 18:21:33 GMT References: <1990Nov12.173812.3227@cbnewsh.att.com> Distribution: na Organization: Lofty Pursuits Public Access Unix for Rapid City, SD USA Lines: 183 In article <1990Nov12.173812.3227@cbnewsh.att.com>, gls@cbnewsh.att.com (Col. G. L. Sicherman) writes: > There is a problem with the memcmp routine on my 6386WGS running Vr3.2. > The ordering it returns is not transitive, or even antisymmetric. In > other words, it can tell you that x < y and y < x, or that x < y < z < x. > > I do not know whether the formal specifications for memcmp require that > it produces a lineal ordering, but obviously it should. A nonlineal > memcmp cannot be used for sorting and searching. The ANSI spec that requires the behavior is in section 4.11.4 on page 165 of the standard. 4.11.4 Comparison Functions The sign of a nonzero value returned by the comparison functions memcmp, strcmp, and strncmp is determined by the sign of the difference between the values of the first pair of characters (both interpreted as unsigned char) that differ in the objects being compared. 4.11.4.1 The memcmp Function Synopsis #include int memcmp(const void *s1, const void *s2, size_t n); Description The memcmp function compares the first n characters of the object pointed to by s1 to the first n characters of the object pointed to by s2. Returns The memcmp function returns an integer greater than, equal to, or less than zero, accordingly as the object pointed to by s1 is greater than, equal to, or less than the object pointed to by s2. I wrote an exhaustive test for the purposes of testing my version of this function (included at the end) but the version supplied with Bell Tech's version of System V/386 3.2 passes my test. > > The 386-specific code does this: > > jz zeroexit > lahf // load low byte of flags to high byte of AX > cwtl // convert word to longword with sign extension > > There are two problems with this. One is that if EAX contains zero and > all the flag bits are 0, it will return 0 instead of positive. This > never seems to happen; apparently the unused bit 1 is always set to 1. The lahf instruction converts the 8086 16 bit flags register to the 8080 8 bit format. The jz before the lahf is used to ensure that a zero is returned only when equalty is true. The not equal case which is what is being determined is the only case where the lahf cwtl comes into play. You will get a negative number or a positive non-zero from the output of this sequence. Probably the comparison operation on the previous line was a rep cmpsb which will compare the two bytes via an 8 bit subtraction and set the flags. One of the unused bits is set to a 1 and was specified to do that on the 8085 (I think) so this is a very safe assumption. > Is anybody working on a fix for the code and (if necessary) the formal > specs for memcmp()? I can't make the original fail. Could you provide a short code segment that will display the claimed behavior? If you can make AT&T's fail, I would like to know if mine fails also. The official specs are copied above and the following is my replacement routine that can be almost 4 times faster than the AT&T implementation. ------------------------------------------------------------------------ .file "memcmp.s" / The Intel 80386 implementation of the ANSI memcmp() function. / Written April 8, 1990 by Doug Ingraham. dpi@loft386.UUCP / Copyright 1990 by Douglas P. Ingraham. This code is freely / distributable so long as this notice remains intact. .text .align 4 .globl memcmp memcmp: / Save edi and esi in edx and eax to save 4 clocks over popping them movl %edi,%edx / 2 clocks 2 bytes movl %esi,%eax / 2 clocks 2 bytes / esi is pointer to the first string movl 0x4(%esp),%esi / 2 clocks 4 bytes / edi is pointer to the second string movl 0x8(%esp),%edi / 2 clocks 4 bytes / This tests to see if the pointers are to the same string. cmpl %esi,%edi / 2 clocks 2 bytes je same / 7+m,3 clocks 2 bytes / The ecx gets the count. If Zero then by definition they are the same. movl 0xC(%esp),%ecx / 2 clocks 4 bytes jcxz same / 9+m,5 clocks 2 bytes / Test the count to make sure there are enough bytes to bother with this / speed optimization. 12 was chosen as a ballpark figure. Must be greater / than 7 as a minimum. cmpl $12,%ecx / 2 clocks 3 bytes jl byte / 7+m,3 clocks 2 bytes / Test to see if either pointer is word aligned. Most is 3 cmpsb's to align / so the loop is unrolled. 75 clocks worst case. testl $3,%esi / 2 clocks 6 bytes jz is_aligned / 7+m,3 clocks 2 bytes testl $3,%edi / 2 clocks 6 bytes jz is_aligned / 7+m,3 clocks 2 bytes cmpsb / 10 clocks 1 byte jne different / 7+m,3 clocks 2 bytes decl %ecx / 2 clocks 1 byte testl $3,%esi / 2 clocks 6 bytes jz is_aligned / 7+m,3 clocks 2 bytes testl $3,%edi / 2 clocks 6 bytes jz is_aligned / 7+m,3 clocks 2 bytes cmpsb / 10 clocks 1 byte jne different / 7+m,3 clocks 2 bytes decl %ecx / 2 clocks 1 byte testl $3,%esi / 2 clocks 6 bytes jz is_aligned / 7+m,3 clocks 2 bytes testl $3,%edi / 2 clocks 6 bytes jz is_aligned / 7+m,3 clocks 2 bytes cmpsb / 10 clocks 1 byte jne different / 7+m,3 clocks 2 bytes decl %ecx / 2 clocks 1 byte / Save the modified count back on the stack for possible later use. is_aligned: movl %ecx,0xC(%esp) / 2 clocks 4 bytes / Divide by 4 so that we can do word compares shrl $2,%ecx / 3 clocks 3 bytes / Do the long level search. repz; cmpsl / 5+9*n clocks 2 bytes je byte_equal / 7+m,3 clocks 2 bytes / backup 4 bytes movl $4,%ecx / 2 clocks 5 bytes subl %ecx,%esi / 2 clocks 2 bytes subl %ecx,%edi / 2 clocks 2 bytes / Do the byte level search. byte: repz; cmpsb / 5+9*n clocks 2 bytes je same / 7+m/3 clocks 2 bytes / Restore the esi so that eax is available for return value. different: movl %eax,%esi / 2 clocks 2 byte / This is a trick to put a positive or negative in the eax. lahf / 2 clocks 1 byte cwtl / 3 clocks 1 byte / Restore the edi. movl %edx,%edi / 2 clocks 2 bytes ret / 10+m clocks 1 byte / If the string compared out then come here. same: movl %eax,%esi / 2 clocks 2 bytes short_same: movl %edx,%edi / 2 clocks 2 bytes xorl %eax,%eax / 2 clocks 2 bytes ret / 10+m clocks 1 byte / come here if the high speed test was equal byte_equal: movl 0xC(%esp),%ecx / 2 clocks 4 bytes andl $3,%ecx / 2 clocks 6 bytes repz; cmpsb / 5+9*n clocks 2 bytes / Restore the esi so that eax is available for return value. movl %eax,%esi / 2 clocks 2 byte je short_same / 7+m/3 clocks 2 bytes / This is a trick to put a positive or negative in the eax. lahf / 2 clocks 1 byte cwtl / 3 clocks 1 byte / Restore the edi. movl %edx,%edi / 2 clocks 2 bytes ret / 10+m clocks 1 byte ------------------------------------------------------------------------ It is quite a bit longer than the original, but is much faster in the general case. Apologies to those with other architectures. -- Doug Ingraham (SysAdmin) Lofty Pursuits (Public Access for Rapid City SD USA) bigtex!loft386!dpi uunet!loft386!dpi