Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watmath!clyde!rutgers!ames!oliveb!sun!plaid!chuq From: chuq@plaid.UUCP Newsgroups: comp.text.desktop Subject: Re: document analysis Message-ID: <20361@sun.uucp> Date: Thu, 4-Jun-87 12:10:57 EDT Article-I.D.: sun.20361 Posted: Thu Jun 4 12:10:57 1987 Date-Received: Sat, 6-Jun-87 08:19:32 EDT Sender: news@sun.uucp Distribution: comp Lines: 46 Approved: desktop-request%plaid@sun.com Date: Thu, 4 Jun 87 13:38:38+0300 From: nsc!nsta!nsta.UUCP!iddo (Iddo Carmon /NSTA (052)-522-267) Organization: National Semiconductor (Israel) Ltd. >From: mlwh@sphinx (Martin Hall) >I would be interested in finding out about document analysis. I mean >this in as general as a sense as you want to take it. Any pointers >would be appreciated. My view is that this kind of activity is best handled by a proper mix of human/machine interaction. Consider the news system as an example: here you have a massive amount of information to choose from, but still you're able to handle it efficiently and select things that are relevant to you by means of software utilities to various degrees of sophistication. However, these utilities all rely on a set of conventions for putting things in header lines that later enable the system to locate articles in the newsgroup hierarchy, and on the intelligence of a human poster who selects the proper newsgroups. Also the structure of the newsgroup hiereachy is developed by humans according to their interests and is a key factor in the ease of selecting specific information. Instead of treating a document as a 1-dimensional stream of characters and trying to extract meaning from that, I'd like to see some common general- purpose high-level 'document-programming' language evolving, that will be used to annotate the text and will then enable automatic parsing of the document into sections, threads of reasoning, selection of pieces by going down a subject menue-tree, etc. Such a convention may make it possible to scan/archive documsnts according to their contents in numerous ways, without a prerequisite for a "natural language understanding superexpert system". -- Iddo Carmon Architecture Dept. Tel: +972-52-522-267 National Semiconductor (Israel) Ltd. uucp: ...!nsc!nsta!iddo P.O.B. 3007, Herzlia B. 46104, Israel {hplabs,pyramid,sun,decwrl} ---------------------------------------- Submissions to: desktop%plaid@sun.com -OR- sun!plaid!desktop Administrivia to: desktop-request%plaid@sun.com -OR- sun!plaid!desktop-request Paths: {ihnp4,decwrl,hplabs,seismo,ucbvax}!sun Chuq Von Rospach chuq@sun.COM Delphi: CHUQ Now, where did my ex-wife put my Fairy Dust?