Path: utzoo!utstat!jarvis.csri.toronto.edu!rutgers!iuvax!watmath!looking!brad From: brad@looking.on.ca (Brad Templeton) Newsgroups: news.software.b Subject: Not Checksums, how about string search hash codes? Message-ID: <49602@looking.on.ca> Date: 18 Nov 89 18:23:33 GMT Organization: Looking Glass Software Limited, Waterloo ON Lines: 24 Class: misc As I understand it (it's been a while) there are some nifty algorithms that can be used to generate hash numbers than can make string searches in regions of text very fast. In particular, as I understand it, using such hash numbers, you can tell right away 95% of the time if a given string is NOT in a body of text. ie. if the function returns false, you can be sure the string is not found. If it's true, it might be found, and you have to search further -- different hash functions, and eventually a full text search to verify things. Anybody know more about this and care to post it? Anyway, it seems that it would be great to calculate the hash numbers for news articles as they are generated and put them in the header. Then article-kill (or find) programs would be many times more efficient, as well as programs that ask, 'where was that article that talked about hashing?' In particular, things would be far more efficient for NNTP over slower links, since you could usually operate on an article just from the header, even if you are searching for strings in the body. This would be far more useful than a checksum. -- Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473