Path: utzoo!attcan!uunet!tut.cis.ohio-state.edu!mstar!crappie.MorningStar.Com!karl From: karl@MorningStar.Com (Karl Fox) Newsgroups: alt.sources.d Subject: Re: Fast strcmp() wanted. Message-ID: <1990Oct5.122245.392@crappie.MorningStar.Com> Date: 5 Oct 90 12:22:45 GMT References: <1646@cherry.edc.UUCP> <1990Sep27.151543.8025@ccs.carleton.ca> <6003@hplabsz.HPL.HP.COM> <1145@exodus.Eng.Sun.COM> cedman@lynx.ps.uci.edu (Carl Edman) writes: In article <1145@exodus.Eng.Sun.COM> falk@peregrine.Sun.COM (Ed Falk) writes: In article <6003@hplabsz.HPL.HP.COM> sartin@hplabs.hp.com (Rob Sartin) writes: Also, these two strings "ab\0x" "ab\0y" (where x and y are any garbage that happens to be in memory after the terminating '\0') will be evaluated as unequal. It ain't necessarily so. Any correct implementation will take into account only the data actually in the string. Saying "32-bit CRC" does not imply reading the string 4 bytes at a time. There are workarounds for both problems, of course, but I think there won't be much of a performance improvement after you've done all it requires to get it right. This is wrong too; when I added "string comparison hashing" to a diff program, it more than doubled the speed. For something like diff, 8-bit or 16-bit values are good enough, assuming an adequate hashing function. You can even imagine systems where it could be possible to remove the actual string from memory, and simply assume that if the 32-bit CRC match, the strings match. Such systems would have to be tolerant about an occasional mismatch. If memory serves correctly the above approach is used in some implementations for diff. (only to give one practical, real world example) Diff still needs to compare the actual strings if the hash values match. Having a diff that is some amount faster but that "is occasionally wrong" wouldn't be too popular, I'd think. -- "I hear you guys deal with such dreck | Karl Fox, Morning Star Technologies as SNA and X.25." -Ed Vielmetti | karl@MorningStar.Com