Path: utzoo!attcan!uunet!cs.utexas.edu!tut.cis.ohio-state.edu!pt.cs.cmu.edu!cat.cmu.edu!jps From: jps@cat.cmu.edu (James Salsman) Newsgroups: comp.society.futures Subject: Re: looking for work on text `interest score' Message-ID: <5696@pt.cs.cmu.edu> Date: 29 Jul 89 10:31:09 GMT References: <8907271735.AA05275@multimax.encore.com> Organization: Carnegie Mellon Lines: 46 If you want to evaluate netnews for interest level, just build a rule-based expert system based on regexp matching. Whenever you are shown something that you don't want to see, tell the user interface why (bogus author, boring subject, too many others with same topic, too long, too many buzzwords, etc.) and have it store that data as a new or modified rule. Even better, after each article the interface could ask for a critique of the message (Thumbs Up or Down, and a reason why from a ~10 item menu; maybe a keyword entry list if the menu item dictates), and the newsreader's rule base would slowly mutate to suit your choices. At the end of the session you could be asked to verify all of the mutations that you've selected, just in case you changed your mind. In article <8907271735.AA05275@multimax.encore.com> ST601716@BROWNVM.BITNET ("Seth R. Trotz") writes: > It is certainly a good suggestion that a neural network is > well suited to the task of providing a numeric rating for some form of > input. If the entire text of an article were fed into the network ... > the network would have to be huge!! Right. At CMU some of McClelland's students are working on connectionist parsing algorithms. IMHO, they have ignored the comp-sci theory behind parsing, so not only are they re-inventing the wheel, but they are taking plenty of time in doing so, and making up new terms for things compiler writers have almost standardized. Talk about a Tower-of-Babel effect! There are a few good researchers starting to emerge in the field of "symbolic connectionism." > What you need to do, I would guess, > is provide some form of hash function to reduce the task. Perhaps create > a dictionary of 10,000 of the most common words in the English language. This > would cover a good percentage of all words in any given article. It would also ignore morphological characteristics of words, which convey much of the meaning. Multilevel parsing <--> planning is the way to go. :James Disclaimer: The University thinks I'm insane, or something. -- :James P. Salsman (jps@CAT.CMU.EDU)