Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version VT1.00C 11/1/84; site vortex.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!ucbvax!decvax!bellcore!vortex!lauren
From: lauren@vortex.UUCP (Lauren Weinstein)
Newsgroups: net.news
Subject: Re: keyword-based news
Message-ID: <846@vortex.UUCP>
Date: Tue, 22-Oct-85 13:31:11 EDT
Article-I.D.: vortex.846
Posted: Tue Oct 22 13:31:11 1985
Date-Received: Fri, 25-Oct-85 02:03:53 EDT
References: <1461@teddy.UUCP>
Organization: Vortex Technology, Los Angeles
Lines: 63

I think I'd agree that just using the words from the first part of
a message (if we could find and ignore all the included text in different 
forms from older articles, the cute opening lines and line eater bug lines)
might be better than people trying to pick their own keywords.  
(Actually, skipping the previously included text might not work, since
often people add comments on the end of such text that would have no
meaning without the included text.  This means that you're stuck
trying to keyword both the included text (again!) and the "new" text.)

But even if you did the above and did a fairly good job of it,
it's still not good enough.  It's only "better" since
letting people (in an uncontrolled keyword environment) pick their
own keywords is SOOOO bad.  

We (humans) can tell what the meaning of a message is (much of the time)
quite quickly because we do considerable analysis of the text while
we're reading!  We automatically ignore the "extraneous" words in a manner
that would be difficult for even a sophisticated program to accomplish.

And the big problems still remain.  People's random word choices
when they write their text result in massive keyword list expansion.
Without centralized keyword control, making good keyword search choices 
remains exceedingly difficult (even with such control, it's still very 
difficult).  Also, analysis of word formats, pluralisms, usage, etc.
still must be considered and are non-trivial problems.  All four of the
keyword error modalities still exist, as do the related control and
coordination problems.

Also, we must not forget the percentage of messages that will horribly
fail the "beginning of message" meaning test and generate all sorts
of "noise" into the keyword systems.

No, it just doesn't fly.  What I think was really being said was
that people's choices for keywords are SO BAD that EVEN taking words from 
the first part of text is better.  But that doesn't mean that taking those
words solves the fundamental problems with keyword systems (which
we've hashed over quite extensively in this group as of late!)

The only way to make a keyword system work at all, even "moderately"
well, in any environment, is to have a centrally controlled and organized
keyword base, with keywords being carefully selected and organized by
people who have the time and inclination to do such work.
I don't see this happening in the Usenet environment, for technical,
logistical, and also "sociological" reasons.

Also, the above doesn't even start to address the problems that any keyword 
system (that might be designed to replace newsgroups) would cause for 
traffic control at sites that needed to limit certain
kinds of traffic.  Nor does it address the fact that even WITH carefully
controlled and "professionally" chosen keywords such systems take
a great deal of time and practice to use even minimally well.

I'd like to second the idea that someone else already posted.  If you
think you like the idea of keyword systems but are personally
unfamiliar with the way REAL keyword systems work--go to a library
and try out their search services.  Keep in mind that they operate with
a VERY carefully controlled keyword base--and be sure to pick a topic
where you'll know how many articles you're MISSING during your searches,
and how many UNDESIRED ones you're getting as well.

It may be an interesting experience for you. 

--Lauren--