Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version VT1.00C 11/1/84; site vortex.UUCP Path: utzoo!watmath!clyde!burl!ulysses!ucbvax!decvax!bellcore!vortex!lauren From: lauren@vortex.UUCP (Lauren Weinstein) Newsgroups: net.news Subject: Re: keyword-based news Message-ID: <846@vortex.UUCP> Date: Tue, 22-Oct-85 13:31:11 EDT Article-I.D.: vortex.846 Posted: Tue Oct 22 13:31:11 1985 Date-Received: Fri, 25-Oct-85 02:03:53 EDT References: <1461@teddy.UUCP> Organization: Vortex Technology, Los Angeles Lines: 63 I think I'd agree that just using the words from the first part of a message (if we could find and ignore all the included text in different forms from older articles, the cute opening lines and line eater bug lines) might be better than people trying to pick their own keywords. (Actually, skipping the previously included text might not work, since often people add comments on the end of such text that would have no meaning without the included text. This means that you're stuck trying to keyword both the included text (again!) and the "new" text.) But even if you did the above and did a fairly good job of it, it's still not good enough. It's only "better" since letting people (in an uncontrolled keyword environment) pick their own keywords is SOOOO bad. We (humans) can tell what the meaning of a message is (much of the time) quite quickly because we do considerable analysis of the text while we're reading! We automatically ignore the "extraneous" words in a manner that would be difficult for even a sophisticated program to accomplish. And the big problems still remain. People's random word choices when they write their text result in massive keyword list expansion. Without centralized keyword control, making good keyword search choices remains exceedingly difficult (even with such control, it's still very difficult). Also, analysis of word formats, pluralisms, usage, etc. still must be considered and are non-trivial problems. All four of the keyword error modalities still exist, as do the related control and coordination problems. Also, we must not forget the percentage of messages that will horribly fail the "beginning of message" meaning test and generate all sorts of "noise" into the keyword systems. No, it just doesn't fly. What I think was really being said was that people's choices for keywords are SO BAD that EVEN taking words from the first part of text is better. But that doesn't mean that taking those words solves the fundamental problems with keyword systems (which we've hashed over quite extensively in this group as of late!) The only way to make a keyword system work at all, even "moderately" well, in any environment, is to have a centrally controlled and organized keyword base, with keywords being carefully selected and organized by people who have the time and inclination to do such work. I don't see this happening in the Usenet environment, for technical, logistical, and also "sociological" reasons. Also, the above doesn't even start to address the problems that any keyword system (that might be designed to replace newsgroups) would cause for traffic control at sites that needed to limit certain kinds of traffic. Nor does it address the fact that even WITH carefully controlled and "professionally" chosen keywords such systems take a great deal of time and practice to use even minimally well. I'd like to second the idea that someone else already posted. If you think you like the idea of keyword systems but are personally unfamiliar with the way REAL keyword systems work--go to a library and try out their search services. Keep in mind that they operate with a VERY carefully controlled keyword base--and be sure to pick a topic where you'll know how many articles you're MISSING during your searches, and how many UNDESIRED ones you're getting as well. It may be an interesting experience for you. --Lauren--