Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.3 alpha 4/15/85; site enea.UUCP
Path: utzoo!watmath!clyde!bonnie!akgua!gatech!seismo!mcvax!enea!sommar
From: sommar@enea.UUCP (Erland Sommarskog)
Newsgroups: net.internat
Subject: Re: Hyphenation (mainly interactive)
Message-ID: <1090@enea.UUCP>
Date: Sat, 16-Nov-85 12:15:21 EST
Article-I.D.: enea.1090
Posted: Sat Nov 16 12:15:21 1985
Date-Received: Thu, 21-Nov-85 07:05:17 EST
References: <471@harvard.ARPA> <773@mmintl.UUCP> <968@enea.UUCP> <501@harvard.ARPA>
Reply-To: sommar@enea.UUCP (Erland Sommarskog)
Distribution: net
Organization: Enea Data, Sweden
Lines: 59

In article <501@harvard.ARPA> David A. Kosower discusses hyphenation,
and argues in favour for algorithmic hyphenation contra interactive
and dictionary methods. Specially he presents an algorithm of Knuth-
Liang, which rather is a combination of a dicionary and a algorithm.
I shall not discuss this algorithm, since I have a very little experience 
of TeX.

Instead I'm going to develop my thoughts about interactive hyphenation.
David gives two main arguments against hyphenating interactively:
1) It takes to much time. Having to hyphenate the same words every time
   you format a document is not any fun.
2) Humans also makes errors. If I've understood him right. He's implying
   that an average writer would make more erroneous hyphenations than
   a good computer algorithm.
   
I start with 2).
After having read the numbers of examples, I'm about to agree with David,
as lons as we stick to English. My reference point this far was Swedish,
and I claim that hyphenation is more simple in our language. It has
some basic rules which are overridden by concatenated words (easy for a
human, a bit more difficult for a computer) and loan words (can be a 
difficulty even for a native Swede sometimes). Since catenated words
are very frequent in Swedish, a method like Knuth-Liang using fragments
of words is probably superior than a true dictionary method.

And so to 1)
Of course it would not be acceptable for a text formatter asking for
the same word in the same text. Since I'm guilty of a relatively small
text formatter, Torino, which has interactive hyphenation, let me
describe how it is implemented.
  When Torino finds a "victim" it finds a proposal according to the basic
rules in Swedish and then asks the user to hyphenate the word, using
the proposal as the default value. The hyphenation is then stored in
a library which is saved on a file which has the same name as the
document but has an other extension. Next time the document is formatted,
Torino first checks out the library and if nothing is found, it asks
the user. When checking the library there is a user-adjustable limit
so that "inter-nationalisation" is not choosen when "internationalisa-
tion" fits into the line.
  The library file is a text file, so if an erroneous syllabication
would have come into the library it is easily removed.
  There are also some other possibilities: You can specify an explicit
syllabication file including no one at all. You can also turn off the
interactive part, using just the library. 
  The method I've used is quite simple and can be improved, but it has
some advantages over algorithmic and dictionary methods. The main one
is that is almost language independent. Porting Knuth-Liang to other
languages than English will imply a good deal of work. 
  It shall be noted that the method I've described is not truly inter-
active, rather it is a combination of all three (algorithm used for
the proposals).  

On this discussion on hyphenation and internationalisation I'd like
to add the following question: Is there any chance that nroff or TeX
or any other English-speaking formatter would ever hyphenate the
Swedish word "tillaga" correctly? Even with help form the user?
The word is hyphenated "till-laga".