Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!wuarchive!udel!rochester!kodak!ispd-newsserver!ism.isc.com!bud.sos.ivy.isc.com!willcr From: willcr@bud.sos.ivy.isc.com (Will Crowder) Newsgroups: comp.lang.c Subject: Re: Efficient STRing CoMPares? Message-ID: <1991Mar21.175913.20196@ism.isc.com> Date: 21 Mar 91 17:59:13 GMT References: <15486@smoke.brl.mil> <1991Mar19.034802.24340@cbnewsk.att.com> <1991Mar20.020957.9180@athena.mit.edu> <1991Mar20.213615.11223@noose.ecn.purdue.edu> Sender: usenet@ism.isc.com (Ism Usenet News) Reply-To: willcr@ivy.isc.com Followup-To: comp.lang.c Organization: Interactive Systems Corp. Lines: 57 In article <1991Mar20.213615.11223@noose.ecn.purdue.edu>, psmith@iies.ecn.purdue.edu (Paul F Smith) writes: |> >In article barton@cbnewsk.att.com (jim.m.barton) writes: |> > |> > struct chunk |> > { |> > char *c_ptr; |> > unsigned int c_hash; |> > }; |> > |> ... |> > |> >You can also write great little symbol table implementations |> >using hashing, and the hash values don't even have to take up |> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |> >extra space (they're implicit in the bucket or slot indices). |> ^^^^^^^^^^^!!! |> > |> > Steve Summit |> > scs@adam.mit.edu |> |> OK, I won't ask about a string hash function :-), but could you |> please elaborate on how your symbol table works. I just implemented a |> hash table that keeps the hash value around (like your struct above), |> which I then use in scanning the list at the given bucket to minimize |> strcmp()s. But, I can't see how you can get rid of it and still reduce |> the number of strcmp()s. Please describe, Thanks! (Since we're talking about one of my favorite data structures, hashing with open chaining, I'll take a shot at this one.) Hashing with open chaining is a method of saving information such that it can be looked up quickly. The basic idea is to deal with hash collisions by maintaining a list (usually a linked-list) of data with identical hash values. Then each hash "bucket" points to the list of data having the same hash value. To look up a key, they key is hashed and the chain having that hash value is scanned for the data. Since all data in a given chain have the same hash value, there is no need to save the hash value anywhere. One optimization might be to further hash the data in each chain, obviously with a different hashing algorithm, and then rehash the key and scan the chain for data having the same new hash value. In this case, you would have to save the new hash value along with the data. You could also have chains on top of chains, etc., etc., but a simple single-level algorithm with one set of buckets is usually efficient enough. |> ------------------------------------------------------------------ |> Paul F. Smith - ADPC - Purdue University psmith@ecn.purdue.edu |> Hope this helps, Will -------------------------------------------------------------------------------- Will Crowder, MTS | "I belong to no organized politcal party. (willcr@ivy.isc.com) | I'm a democrat." INTERACTIVE Systems Corp. | -- Will Rogers