Path: utzoo!utstat!jarvis.csri.toronto.edu!mailrus!uunet!cs.utexas.edu!usc!bbn!bbn.com!rsalz From: rsalz@bbn.com (Rich Salz) Newsgroups: news.software.b Subject: Re: Modifying news storage for fast searches Message-ID: <2179@prune.bbn.com> Date: 22 Nov 89 16:49:55 GMT References: <51195@looking.on.ca> Organization: BBN Systems and Technologies Corporation Lines: 29 In <51195@looking.on.ca> brad@looking.on.ca (Brad Templeton) writes: >Well, if news.software.b is any indication, a fast search system for >News might not be too bad. >Another idea is to store articles in a special compressed form that lists >the dictionary first (ie. the list of words) followed by the text expressed >and indices into the word list. Free-text retrieval is basically a solved problem. Go buy books by (Gerald?) Salton. Check out your Unix documentation for "Some Examples of Inverted Indices on the Unix System" by Mike Lesk (USD:30 in the BSD docs, I don't know where for other systems -- 2B for Version 7, I think). There was a mini text-retrieval system that appeared in comp.sources.misc qndxr I think the name was. There will be a bigger system in c.s.unix in a couple of weeks. Associative retrieval -- "give me more articles like THIS one" was first proposed in the 1950's. Thinking Machines has one hell of a sexy demo on it. To follow the Usenet trend of "I said it first," I guess I should say that I proposed this on the news-interfaces list nearly a year ago. /r$ -- Please send comp.sources.unix-related mail to rsalz@uunet.uu.net. Use a domain-based address or give alternate paths, or you may lose out.