Path: utzoo!attcan!uunet!mailrus!ncar!mephisto!udel!burdvax!finin@prc.unisys.com From: finin@prc.unisys.com (Tim Finin) Newsgroups: comp.ai Subject: Parsing Natural Language Using Generalized Mutual Information Message-ID: <13534@burdvax.PRC.Unisys.COM> Date: 12 Apr 90 15:39:58 GMT Sender: news@PRC.Unisys.COM Reply-To: finin@prc.unisys.com (Tim Finin) Organization: Unisys Center for Advanced Information Technology AI SEMINAR UNISYS Center for Advanced Information Technology (formerly Unisys Paoli Research Center) Parsing Natural Language Using Generalized Mutual Inform Lines: 33 The standard approach to parsing natural language is using grammar-based algorithms. While effective at characterizing and classifying sentences using toy grammars, grammar-based parsing techniques are only as robust as the grammars they use. But characterizing concisely the entire grammar of a natural language is an extremely difficult task. In this talk, I will present an alternative to the grammar-based approach, a stochastic parsing method based on finding constituent boundaries, or distituents, using a generalized mutual information statistic. This method, called distituent parsing, is based on the hypothesis that constituent boundaries can be extracted from a given part-of-speech n-gram by analyzing the mutual information values within the n-gram. This hypothesis is supported by the performance of an implementation of this parsing algorithm which determines all levels of sentence structure from a variety of English text with a relatively low error rate. During this talk, I will derive the generalized mutual information statistic, describe the parsing algorithm, and present results and sample output from the parser. I will then discuss the potential applications of this approach in conjunction with traditional grammar-based techniques. 11:00 am Tuesday, April 17, 1990 CAIT Conference Room Unisys Center for Advanced Information Technology Great Valley Laboratories #1 70 E. Swedesford Road Paoli PA 19301 -- non-Unisys visitors who are interested in attending should -- -- send email to finin@prc.unisys.com or call 215-648-2480 -- -- Tim Finin finin@prc.unisys.com Center for Advanced Information Technology 215-648-2840, 215-648-2288 (fax) Unisys, PO Box 517, Paoli, PA 19301 215-386-1749 (home)