Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!rutgers!ames!oliveb!sun!plaid!chuq From: chuq%plaid@Sun.COM (Chuq Von Rospach) Newsgroups: comp.text.desktop Subject: Re: Hy-phen-a-tion dic-tion-ary Message-ID: <20902@sun.uucp> Date: Thu, 11-Jun-87 12:21:08 EDT Article-I.D.: sun.20902 Posted: Thu Jun 11 12:21:08 1987 Date-Received: Sat, 20-Jun-87 10:13:56 EDT Sender: news@sun.uucp Distribution: comp Lines: 76 Approved: desktop-request%plaid@sun.com Summary: May I introduce you to TeX: The Program? From: rokicki@rocky.stanford.edu (Tomas Rokicki) Date: 8 Jun 87 22:52:39 GMT Organization: Stanford University Computer Science Department In article <20583@sun.uucp>, chuq%plaid@Sun.COM (Chuq Von Rospach) writes: > Yep, storing the dictionary is the heart of the problem. There is a > guy at Stanford who did his thesis on the subject. I forget his name > but his thesis is published, so people at Stanford should be able to > locate it. I'd wager that there is available software to demonstrate > his thesis. He found some pretty neat ways to compress the dictionary. Please pick up a copy of Computers & Typesetting, Volume B, entitled TeX: The Program, by Don Knuth, published by Addison Wesley Publishing Company. I paid $34.95 for my copy. You should start at section 919, page 386, and read. It describes an implementation of Frank M. Liang's hyphenation algorithm, which is described more fully in Liang's PhD thesis from Stanford University. (If anyone wants a copy of this, mail me and I'll give you details.) The dictionary is not compressed; rather, patterns are found and used. These patterns are amazingly regular and dependable. An exception dictionary is used for those few exceptions (I believe a few dozen have been found.) Please, anyone writing or considering writing a typesetting program, consider using these algorithms. They are fast, small, use little data space, and *work*. There is no excuse for the poor hyphenation so many systems give you, and also no reason to have to put in hyphens by hand. If you desire, you can rip the code right out of TeX, since the source code is so available and readable; Don Knuth specifically allows this. Also, when using systems with automatic hyphenation, please look over the hyphens you are given and insure they are reasonable. There are many cases where a hyphen might be okay, except in that particular case. For instance, Automatic hyphenation systems used in auto- Here the context and hyphenation leads the reader to expect the word to be `automatic', but the word might be `automobiles', jarring the reader ever so slightly. There are much better examples, but none spring to mind at the moment. Liang's algorithm can be adapted to foreign languages fairly easily. Some languages, such as German, which change the spelling of words when hyphenated, can cause some difficulty. So, here's a question for y'all. How should eighteen be hyphenated? Dictionaries disagree; I want explanations of your choice. Neither eight-een nor eigh-teen look quite right. I vote for eight-teen, but this violates accepted English hyphenation rules; this is one of those words you should avoid hyphenating. Bad hyphenation exists all over, and can be quite comical. The text `Introducing Artificial Intelligence' by G. L. Simons hyphenates vie- wed. That word, again, was viewed. Took you a second shot to read it, eh? ``Logic Design Principles'' by Edward J. McCluskey abounds with bad hyphenations; pick up a copy and look at a random page. For instance, `wav-eform.' And, lastly, for the sake of Pete, do not break paragraphs into lines a line at a time! Sorry for the length; I get carried away. -tom ---------------------------------------- Submissions to: desktop%plaid@sun.com -OR- sun!plaid!desktop Administrivia to: desktop-request%plaid@sun.com -OR- sun!plaid!desktop-request Paths: {ihnp4,decwrl,hplabs,seismo,ucbvax}!sun Chuq Von Rospach chuq@sun.COM Delphi: CHUQ Now, where did my ex-wife put my Fairy Dust?