Path: utzoo!attcan!utgpu!cs.utexas.edu!swrinde!ucsd!orion.oac.uci.edu!cedman
From: cedman@lynx.ps.uci.edu (Carl Edman)
Newsgroups: alt.sources.d
Subject: Re: Fast strcmp() wanted.
Message-ID: <CEDMAN.90Oct6100336@lynx.ps.uci.edu>
Date: 6 Oct 90 17:03:34 GMT
References: <1646@cherry.edc.UUCP> <OTTO.90Sep27145643@tukki.jyu.fi>
	<CEDMAN.90Sep27075013@lynx.ps.uci.edu>
	<1990Sep27.151543.8025@ccs.carleton.ca>
	<CEDMAN.90Sep29091115@lynx.ps.uci.edu> <6003@hplabsz.HPL.HP.COM>
	<1145@exodus.Eng.Sun.COM> <CEDMAN.90Oct4171710@lyn
Organization: non serviam
Lines: 46
Nntp-Posting-Host: lynx.ps.uci.edu
In-reply-to: karl@MorningStar.Com's message of 5 Oct 90 12:22:45 GMT

In article <1990Oct5.122245.392@crappie.MorningStar.Com> karl@MorningStar.Com (Karl Fox) writes:
   I write:
      You can even imagine systems where it could be possible to remove
      the actual string from memory, and simply assume that if the 32-bit
      CRC match, the strings match. Such systems would have to be tolerant
      about an occasional mismatch.

      If memory serves correctly the above approach is used in some
      implementations for diff. (only to give one practical, real world
      example)

   Diff still needs to compare the actual strings if the hash values
   match.  Having a diff that is some amount faster but that "is
   occasionally wrong" wouldn't be too popular, I'd think.

Now, if I remember correctly, that wasn't really a problem. I haven't
got the source in front of me , but I remember it goes something like
this.

    1. Read both files , calculate the CRC for each line, and then forget
       the files themselves

    2. Do the standard algorithm based on the CRCs alone

    3. Check the correctness of the CRC comparison.
       If it is correct, fine
       If it isn't then
           4. print a warning message
           5. add another diff command to the output to make up for the
              inequality of these 2 lines.

This algorithm was quite fast, and allowed the comparison of truely
giantic files. The only penalty you pay for an incorrect comparison
was a (possibly) suboptimal diff, but that was a small and relatively
infrequent problem. So you could rely on the correctness of your
diff output.

	Carl Edman


Theorectial Physicist,N.:A physicist whose   | Send mail
existence is postulated, to make the numbers |  to
balance but who is never actually observed   | cedman@golem.ps.uci.edu
in the laboratory.                           | edmanc@uciph0.ps.uci.edu