Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.3 4.3bsd-beta 6/6/85; site nsc.UUCP Path: utzoo!watmath!clyde!cbosgd!ihnp4!nsc!chuqui From: chuqui@nsc.UUCP (Chuq Von Rospach) Newsgroups: net.news Subject: Re: keyword-based news Message-ID: <3210@nsc.UUCP> Date: Wed, 2-Oct-85 13:08:01 EDT Article-I.D.: nsc.3210 Posted: Wed Oct 2 13:08:01 1985 Date-Received: Fri, 4-Oct-85 05:04:07 EDT References: <820@vortex.UUCP> Reply-To: chuqui@nsc.UUCP (Chuq Von Rospach) Organization: Ninja Ewok Training Grounds Lines: 32 In article <820@vortex.UUCP> lauren@vortex.UUCP (Lauren Weinstein) writes: >For quite a few years, I've been using a very elaborate keyword-based >system for searching a large newswire story database. This database >is in a centralized location so there is no concern about COSTS associated >with extra matches, unlike the Usenet situation. > >One thing I learned long ago thanks to this system--it is almost >IMPOSSIBLE to avoid major missed matches AND extra matches. If you >try to make your keyword choices very specific and negate out topics >of no interest, you frequently (*VERY* frequently) find that you're missing >great numbers of stories that you really DID want to see, but where >a particular keyword you specified wasn't used. Or you find that *MANY* >stories you wanted to filter OUT still get through since the keywords >you wanted to SKIP weren't used. Lauren has a point, but if this system is like all of the other newswire searching systems I've seen it has limited applicability to a keyword based news system. The problem is that doing keyword searches on a general database IS going to bring forward lots of silly matches because the words just happen to be used in otherwise unrelated articles. What I'm planning on doing for NNTN, though, is to have the author attach the appropriate keywords to the article. Rather that simply grepping text for the words, you look at only the keywords the author thinks is important. Even if the author is completely incompetent with this keyword selection this should keep the accidental matches down to a minimum. You can't ignore the problem, but you also have to realize that Lauren's example is to a great extent an Apple and Orange comparison to USENET and its problems. -- :From under the bar at Callahan's: Chuq Von Rospach nsc!chuqui@decwrl.ARPA {decwrl,hplabs,ihnp4,pyramid}!nsc!chuqui If you can't talk below a bellow, you can't talk... Brought to you by Super Global Mega Corp .com