Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!samsung!uunet!mcsun!news.funet.fi!fuug!sics.se!ifi.uio.no!enag From: enag@ifi.uio.no (Erik Naggum) Newsgroups: comp.text Subject: Re: Public Domain Dictionary Message-ID: Date: 24 Jun 91 17:11:31 GMT References: <91156.144339GONTER@awiwuw11.wu-wien.ac.at> <811@tivoli.UUCP> <1991Jun23.221908.20234@watdragon.waterloo.edu> Sender: enag@ifi.uio.no (Erik Naggum) Organization: Naggum Software, Oslo, Norway Lines: 92 Nntp-Posting-Host: gyda.ifi.uio.no In-Reply-To: tbray@watsol.waterloo.edu's message of 23 Jun 91 22: 19:08 GMT Originator: enag@gyda.ifi.uio.no Tim Bray writes: | | It's not clear to me that an SGML-conformant dictionary is either | necessary or desirable. A dictionary should be a small model of a | human language. Not even SGML's strongest partisans claim for it | an ability to model natural language. I must have missed something really crucial. I have always thought that a dictionary _entry_ is a structured unit of information in a dictionary, containing other, smaller units of information, such as word class, etymology, pronunciation, inflection, and a number of definitions. What is the relevance of "natural languages" in this? SGML is a language in which you express the structure of information, among other things, and _all_ information has _some_ structure, other- wise it's noise. SGML is suitable to express any kind of structure which has a hierarchical nature, i.e. every element is contained in toto in another element. There are some cases where this is not true, and SGML fails to handle those cases in the simplest way with attribute-less tags, yet it can be done with tags and time-space coordinates and reference points to describe start and stop of any event, including overlapping spatial elements. I don't think you have a good grip on what SGML is, but you're not alone. The only wish I have is that those who have nth-hand infor- mation and knowledge on SGML please try to verify it, especially as n approaches infinity. | Indeed. Frank Tompa of the New OED project at Waterloo, who has | had a lot of experience with online dictionaries, in co-operation | with Bob Amsler, then of Bellcore, now of Mitre, put in a lot of | time and came up with a proposed SGML def for dictionaries. But | it was tough, and even Tompa and Amsler were left somewhat | unsatisfied that they had covered the bases. This is not totally relevant to the OED2 project. The OED2 project had some significant real life constraints to work with, such as an existing dictionary. It's very unlikely to have a very large number of dictionary entries be consistent with any given structure, unless that structure is so large it becomes chaotic and useless. If you sat down to work out a DTD ("SGML def"?) for a dictionary, you would spend a large amount of time doing so, instead of randomly stuffing things into dictionary entries with only intuitive guidelines for structur- ing, so as not to confuse the poor user. Document analysis and design are truly _hard_ tasks, and require a lot more than people think. The complexity of the task of course grows with the complexity of the document under analysis. That doesn't mean it can't be done, which you imply. Once it's defined, it should also capture the way we can best retrieve information from a given instance, such as a dictionary entry. Of course this is hard. What did you expect? | I have to disagree on the "not pretty" part. Challenging, | complex, somewhat irregular, yes, all of those are true. But this | is no uglier than the English language that the OED is trying to | describe. I don't understand how you can put both a description and the object of a description into one big bag and get anything useful out of it. To me, it looks like you suffer from a severe layering confusion, wherein an abstraction (description) of an entity can be no different in complexity than the entity itself. This is a very naive view. It's also remarkably counter-productive, as the main objective of abstraction is to reduce complexity to a level where humans can com- fortably deal with it. I'm utterly amazed that this comes from one who has worked with the OED2 dictionary project. A description, or structure specification, or whatever, will neces- sarily have to extract the essential elements of what is described or specified. Otherwise, it's useless, as one can turn only to the described element and get a better idea of it. "Essential", of course, requires (human) intelligence and creativity in discovering what is and is not essential. The whole task of writing a definition is centered around discarding the unimportant. A document type like- wise requires that one extract the essentials, according to one or a few views, which have to be known explicitly by the designer. It so happens that SGML is a language in which one can express the interrelationships between elements of a hierarchical structure in such a way as to produce a consistent type, of which any given dictionary, dictionary entry, and on down, are instances. I don't understand how you can claim that SGML can't model natural languages. It wasn't intended to, and the question is completely irrelevant to the structuring of dictionary entries. It's like claiming that TeX can't model emotions, or that the programming language C can't model sexual experiences. -- Erik Naggum Professional Programmer +47-2-836-863 Naggum Software Electronic Text 0118 OSLO, NORWAY Computer Communications