Path: utzoo!attcan!uunet!cs.utexas.edu!tut.cis.ohio-state.edu!pt.cs.cmu.edu!cat.cmu.edu!jps
From: jps@cat.cmu.edu (James Salsman)
Newsgroups: comp.society.futures
Subject: Re: looking for work on text `interest score'
Message-ID: <5696@pt.cs.cmu.edu>
Date: 29 Jul 89 10:31:09 GMT
References: <8907271735.AA05275@multimax.encore.com>
Organization: Carnegie Mellon
Lines: 46

If you want to evaluate netnews for interest level, just
build a rule-based expert system based on regexp matching.
Whenever you are shown something that you don't want to see,
tell the user interface why (bogus author, boring subject,
too many others with same topic, too long, too many
buzzwords, etc.) and have it store that data as a new or
modified rule.  Even better, after each article the
interface could ask for a critique of the message (Thumbs Up
or Down, and a reason why from a ~10 item menu; maybe a
keyword entry list if the menu item dictates), and the
newsreader's rule base would slowly mutate to suit your
choices.  At the end of the session you could be asked to
verify all of the mutations that you've selected, just in
case you changed your mind.

In article <8907271735.AA05275@multimax.encore.com> ST601716@BROWNVM.BITNET ("Seth R. Trotz") writes:

>      It is certainly a good suggestion that a neural network is
> well suited to the task of providing a numeric rating for some form of
> input. If the entire text of an article were fed into the network ...
> the network would have to be huge!!

Right.  At CMU some of McClelland's students are working on
connectionist parsing algorithms.  IMHO, they have ignored
the comp-sci theory behind parsing, so not only are they
re-inventing the wheel, but they are taking plenty of time
in doing so, and making up new terms for things compiler
writers have almost standardized.  Talk about a
Tower-of-Babel effect!  There are a few good researchers
starting to emerge in the field of "symbolic connectionism."

> What you need to do, I would guess,
> is provide some form of hash function to reduce the task. Perhaps create
> a dictionary of 10,000 of the most common words in the English language. This
> would cover a good percentage of all words in any given article. 

It would also ignore morphological characteristics of
words, which convey much of the meaning.  Multilevel
parsing <--> planning is the way to go.

:James

Disclaimer:  The University thinks I'm insane, or something.
-- 

:James P. Salsman (jps@CAT.CMU.EDU)