Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.3 alpha 4/15/85; site enea.UUCP Path: utzoo!watmath!clyde!bonnie!akgua!gatech!seismo!mcvax!enea!sommar From: sommar@enea.UUCP (Erland Sommarskog) Newsgroups: net.internat Subject: Re: Hyphenation (mainly interactive) Message-ID: <1090@enea.UUCP> Date: Sat, 16-Nov-85 12:15:21 EST Article-I.D.: enea.1090 Posted: Sat Nov 16 12:15:21 1985 Date-Received: Thu, 21-Nov-85 07:05:17 EST References: <471@harvard.ARPA> <773@mmintl.UUCP> <968@enea.UUCP> <501@harvard.ARPA> Reply-To: sommar@enea.UUCP (Erland Sommarskog) Distribution: net Organization: Enea Data, Sweden Lines: 59 In article <501@harvard.ARPA> David A. Kosower discusses hyphenation, and argues in favour for algorithmic hyphenation contra interactive and dictionary methods. Specially he presents an algorithm of Knuth- Liang, which rather is a combination of a dicionary and a algorithm. I shall not discuss this algorithm, since I have a very little experience of TeX. Instead I'm going to develop my thoughts about interactive hyphenation. David gives two main arguments against hyphenating interactively: 1) It takes to much time. Having to hyphenate the same words every time you format a document is not any fun. 2) Humans also makes errors. If I've understood him right. He's implying that an average writer would make more erroneous hyphenations than a good computer algorithm. I start with 2). After having read the numbers of examples, I'm about to agree with David, as lons as we stick to English. My reference point this far was Swedish, and I claim that hyphenation is more simple in our language. It has some basic rules which are overridden by concatenated words (easy for a human, a bit more difficult for a computer) and loan words (can be a difficulty even for a native Swede sometimes). Since catenated words are very frequent in Swedish, a method like Knuth-Liang using fragments of words is probably superior than a true dictionary method. And so to 1) Of course it would not be acceptable for a text formatter asking for the same word in the same text. Since I'm guilty of a relatively small text formatter, Torino, which has interactive hyphenation, let me describe how it is implemented. When Torino finds a "victim" it finds a proposal according to the basic rules in Swedish and then asks the user to hyphenate the word, using the proposal as the default value. The hyphenation is then stored in a library which is saved on a file which has the same name as the document but has an other extension. Next time the document is formatted, Torino first checks out the library and if nothing is found, it asks the user. When checking the library there is a user-adjustable limit so that "inter-nationalisation" is not choosen when "internationalisa- tion" fits into the line. The library file is a text file, so if an erroneous syllabication would have come into the library it is easily removed. There are also some other possibilities: You can specify an explicit syllabication file including no one at all. You can also turn off the interactive part, using just the library. The method I've used is quite simple and can be improved, but it has some advantages over algorithmic and dictionary methods. The main one is that is almost language independent. Porting Knuth-Liang to other languages than English will imply a good deal of work. It shall be noted that the method I've described is not truly inter- active, rather it is a combination of all three (algorithm used for the proposals). On this discussion on hyphenation and internationalisation I'd like to add the following question: Is there any chance that nroff or TeX or any other English-speaking formatter would ever hyphenate the Swedish word "tillaga" correctly? Even with help form the user? The word is hyphenated "till-laga".