Path: utzoo!attcan!uunet!tut.cis.ohio-state.edu!mstar!crappie.MorningStar.Com!karl
From: karl@MorningStar.Com (Karl Fox)
Newsgroups: alt.sources.d
Subject: Re: Fast strcmp() wanted.
Message-ID: <1990Oct5.122245.392@crappie.MorningStar.Com>
Date: 5 Oct 90 12:22:45 GMT
References: <1646@cherry.edc.UUCP> <OTTO.90Sep27145643@tukki.jyu.fi>
	<CEDMAN.90Sep27075013@lynx.ps.uci.edu>
	<1990Sep27.151543.8025@ccs.carleton.ca>
	<CEDMAN.90Sep29091115@lynx.ps.uci.edu> <6003@hplabsz.HPL.HP.COM>
	<1145@exodus.Eng.Sun.COM> <CEDMAN.90Oct4171710@lyn
Sender: usenet@crappie.MorningStar.Com
Reply-To: karl@MorningStar.Com (Karl Fox)
Organization: Morning Star Technologies
Lines: 40
In-Reply-To: cedman@lynx.ps.uci.edu's message of 5 Oct 90 00:17:10 GMT

In article <CEDMAN.90Oct4171710@lynx.ps.uci.edu> cedman@lynx.ps.uci.edu (Carl Edman) writes:

   In article <1145@exodus.Eng.Sun.COM> falk@peregrine.Sun.COM (Ed Falk) writes:
      In article <6003@hplabsz.HPL.HP.COM> sartin@hplabs.hp.com (Rob Sartin) writes:
      Also, these two strings

	      "ab\0x"
	      "ab\0y"

      (where x and y are any garbage that happens to be in memory after the
      terminating '\0') will be evaluated as unequal.

It ain't necessarily so.  Any correct implementation will take into
account only the data actually in the string.  Saying "32-bit CRC"
does not imply reading the string 4 bytes at a time.

      There are workarounds for both problems, of course, but I think there
      won't be much of a performance improvement after you've done all it
      requires to get it right.

This is wrong too; when I added "string comparison hashing" to a diff
program, it more than doubled the speed.  For something like diff,
8-bit or 16-bit values are good enough, assuming an adequate hashing
function.

   You can even imagine systems where it could be possible to remove
   the actual string from memory, and simply assume that if the 32-bit
   CRC match, the strings match. Such systems would have to be tolerant
   about an occasional mismatch.

   If memory serves correctly the above approach is used in some
   implementations for diff. (only to give one practical, real world
   example)

Diff still needs to compare the actual strings if the hash values
match.  Having a diff that is some amount faster but that "is
occasionally wrong" wouldn't be too popular, I'd think.
--
"I hear you guys deal with such dreck  |  Karl Fox, Morning Star Technologies
as SNA and X.25."       -Ed Vielmetti  |  karl@MorningStar.Com