Path: utzoo!utstat!jarvis.csri.toronto.edu!mailrus!iuvax!watmath!looking!brad From: brad@looking.on.ca (Brad Templeton) Newsgroups: news.software.b Subject: Re: Not Checksums, how about string search hash codes? Message-ID: <50572@looking.on.ca> Date: 21 Nov 89 04:11:26 GMT References: <49602@looking.on.ca> <1989Nov20.073448.18953@twwells.com> Organization: Looking Glass Software Ltd. Lines: 15 Class: discussion In article <1989Nov20.073448.18953@twwells.com> bill@twwells.com (T. William Wells) writes: >This is easy enough. Just take any hashing function and use it to >set bits in a bit table for each word in the article. If a hashed >word creates a bit that is not in the table, the word is not in >the article. Is that all it is? Seems that would require a lot of bits. There were 151 unique words in your article, so you would need a 1500 bit table (188 bytes, or 226 bytes of printable characters) to get a 10% hit ratio using this algorithm. Of course you could have the hit ratio vary according to the article, but 226 seems like a lot. I thought there were more compact algorithms. -- Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473