Path: utzoo!attcan!uunet!lll-winken!lll-tis!mordor!joyce!ames!pasteur!ucbvax!CS.ROCHESTER.EDU!nl-kr-request From: nl-kr-request@CS.ROCHESTER.EDU (NL-KR Moderator Brad Miller) Newsgroups: comp.ai.nlang-know-rep Subject: NL-KR Digest Volume 5 No. 15 Message-ID: <8809130129.AA09965@teak.cs.rochester.edu> Date: 13 Sep 88 01:12:00 GMT Sender: daemon@ucbvax.BERKELEY.EDU Reply-To: nl-kr@CS.ROCHESTER.EDU Organization: University of Rochester, Department of Computer Science Lines: 211 Approved: nl-kr@cs.rochester.edu NL-KR Digest (9/12/88 21:11:42) Volume 5 Number 15 Today's Topics: Re: open/closed classes nl evaluation workshop Data Wanted: Re: GPSG parsers Submissions: NL-KR@CS.ROCHESTER.EDU Requests, policy: NL-KR-REQUEST@CS.ROCHESTER.EDU ---------------------------------------------------------------------- Date: Thu, 1 Sep 88 07:54 EDT From: Bruce E. Nevin Subject: open/closed classes There is some recent work by Leonard Talmy on the supposed cognitive whys and wherefores of open vs closed classes. Sorry, I don't have a reference handy. The supposition that a speech recognizer has to be especially good at hearing closed-class words misses an important point: the closed-class words are unstressed and generally subject to reduction in phonemic--how shall I say--extent. This is part of a general process, apparently in all languages, of reducing the phonemic representation of words that carry less information. They are reducible to the extent that they are redundant. (Not much difficulty predicting the filler in the context 'He __ gone.' You only need enough phonemic content to distinguish the words 'has, had, was' plus of course more obvious--and less reduced-- constructions incorporating these such as their negatives, `will have gone', etc.) Historically, closed-class morphology derives from open-class words that have become more redundant and predictable, so that their reduced forms become `frozen' in their now predictable contexts. An example is the suffix -hood in `childhood', from an earlier form had meaning `state', something like `child-state'. The suffix -ly in adverbs of manner derives from the dative of a word for `form, body'. Ancestors of Proto-Indo-European not having been reconstructed, we have no confirmation that this is the origin of inflectional morphology such as the preterit in descendant languages like English, but that is certainly the most plausible assumption. In American Indian languages, Shirley Silver dubbed this process `morphemization' almost 20 years ago. So affixes (inherently closed-class morphology) appear to be derived by reduction from once free-standing words. Similarly for closed-class words. `Because' derives from `by cause'. OED cites 1305 `bi cause whi'; whi or `why' is the instrumental of the wh- pronouns typified by `what', reduced to `that' in the later `by cause that, because that'. (Compare reduction of cause to zero in `for the cause why' --> `forwhy', a common conjunction now obsolete, to which compare further `from the place where' --> `from where'.) Zeroing of `why ~ that' in `because why, because that' leaves `because' as a conjunction, a closed-class word. (See Jespersen _Modern English Grammar on Historical Principles_ V 397 and Harris _A Grammar of English on Mathematical Principles_ 195 for further details.) An example currently in progress in English is `going to' --> `gonna', a reduction that takes place before verbs but not before nouns (*`I'm gonna New York') precisely because `going to' can occur before the whole class of verbs (and consequently carries less information and is subject to reduction there) but cannot occur before every possible noun. (Note that in e.g. `I'm going to authority' an indefinite noun, one of exceptionally broad distribution, can be understood as having been elided: `I'm going to someone of/in authority'. It is not possible to reverse a reduction in this way to account for the broad distribution of `going to' before verbs.) This appears to be on the way to being a separate future tense morpheme in the closed-class set. The above example of `forwhy' illustrates that closed-class words also become obsolete and drop from the language. The class is closed with respect to distribution, and conservative but not closed with respect to change. Bruce Nevin bn@cch.bbn.com ------------------------------ Date: Fri, 2 Sep 88 12:19 EDT From: palmer@PRC.Unisys.COM Subject: nl evaluation workshop CALL FOR PARTICIPATION Workshop on Evaluation of Natural Language Processing Systems Dec 8-9 Wayne Hotel, Wayne, PA (Philadelphia) There has been much recent interest in the difficult problem of evaluating natural language systems. With the exception of natural language interfaces there are few work- ing systems in existence, and they tend to be concerned with very different tasks and use equally different techniques. There has been little agreement in the field about training sets and test sets, or about clearly defined subsets of problems that constitute standards for different levels of performance. Even those groups that have attempted a meas- ure of self-evaluation have often been reduced to discussing a system's performance in isolation - comparing its current performance to its previous performance rather than to another system. As this technology begins to move slowly into the marketplace, the need for useful evaluation tech- niques is becoming more and more obvious. The speech com- munity has made some recent progress toward developing new methods of evaluation, and it is time that the natural language community followed suit. This is much more easily said than done and will require a concentrated effort on the part of the field. There are certain premises that should underly any dis- cussion of evaluation of natural language processing sys- tems: (1) It should be possible to discuss system evaluation in general without having to state whether the pur- pose of the system is "question-answering" or "text processing." Evaluating a system requires the definition of an application task in terms of I/O pairs which are equally applicable to question- answering, text processing, or generation. (2) There are two basic types of evaluation: a) "black box evaluation" which measures system performance on a given task in terms of well-defined I/O pairs; and b) "glass box evaluation" which examines the internal workings of the system. For example, glass box per- formance evaluation for a system that is supposed to perform semantic and pragmatic analysis should include the examination of predicate-argument rela- tions, referents, and temporal and causal relations. Given these premises, the workshop will be structured around the following three sessions: 1) Defining "glass box evaluation" and "black box evaluation." 2) Defining criteria for "black box evaluation." _A Proposal for establishing task oriented benchmarks for NLP Systems_ (Session Chair - Beth Sundheim) 3) Defining criteria for "glass box evaluation." (Session Chair - Jerry Hobbs) Several different types of systems will be discussed, including question-answering sys- tems, text processing systems and generation systems. Researchers interested in participating are requested to submit a short (250-500 word) description of their experience and interests, and what they could contribute to the workshop. In particular, if they have been involved in any evaluation efforts that they would like to report on, they should include a short abstract (500-1000 words) as well. The number of participants at the workshop must be restricted due to limited room size. The descriptions and abstracts will be reviewed by the following committee: Mar- tha Palmer (Unisys), Mitch Marcus (University of Pennsyl- vania), Beth Sundheim (NOSC), Ed Hovy (ISI), Tim Finin (Unisys), Lynn Bates (BBN). They should arrive at the address given below no later than October 1st. Responses to all who submit abstracts or descriptions will be sent by November 1st. Martha Palmer Unisys Research & Development PO Box 517 Paoli, PA 19301 palmer@prc.unisys.com (215) 648-7228 ------------------------------ Date: Mon, 5 Sep 88 16:36 EDT From: Mark William Hopkins Subject: Data Wanted: I am in need of some English text, for setting up a data base. If you have any to contribute please e-mail them to me. I asked Jerry Lewis to set up a telethon for this, but he said he was busy :-) ------------------------------ Date: Mon, 12 Sep 88 08:02 EDT From: COR_HVH%HNYKUN52.BITNET@CUNYVM.CUNY.EDU Subject: GPSG parsers Some time ago I asked for information on GPSG parsers (or parser-generators) and promised to report any replies. Up to now, I have been notified of two efforts in this area. At the Technical University in Berlin a PROLOG system is being developed in a machine translation context (Eurotra). It is able to parse and generate sentences according to a small English or a medium German grammar. At Boeing work is done on a LISP GPSG parser with the eventual aim of automatic message processing. The system can parse English sentences using a fairly large grammar and dictionary. Neither system uses "pure" GPSG (in case it exists at all), the most important difference being the absence of metarules. I will ask both my contacts to do a more detailed write-up about their work and submit them to this list. Hans van Halteren COR_HVH@HNYKUN52.BITNET ------------------------------ End of NL-KR Digest *******************