Xref: utzoo comp.text.tex:3453 soc.culture.german:1778 Path: utzoo!attcan!uunet!wuarchive!rice!news From: dorai@tone.rice.edu (Dorai Sitaram) Newsgroups: comp.text.tex,soc.culture.german Subject: Ascii German -> Diacritics Message-ID: <1990Oct20.042619.10038@rice.edu> Date: 20 Oct 90 04:26:19 GMT Sender: news@rice.edu (News) Organization: Rice University, Houston Lines: 54 If you're interested in a teensy program that converts "Ascii German" into "diacritical German", you might want to read this. As I'm pretty certain the demand will be rather small, I won't post it. Send me email if you want the program. Description: \begin{nomenclature} Ascii German (AG): the 26-letter German (orthography) commonly used on "regular" keyboards, viz., ae, oe, ue, ss for the umlauted vowels and eszet. Diacritical German (DG): the real stuff, viz., what \"a, \"o, \"u, \ss produce in TeX's output. \end{nomenclature} Why? A lot of people comfortably read and write AG on keyboards instead of using TeX source (\"a) or even german.sty kind of source ("a). However, when such text eventually gets converted to something which allows visible umlauts and eszet, we'd like to convert our AG into the proper diacritical form. This is slightly tricky since, although coding DG into AG is 1-1, the reverse (decoding) isn't. One certainly can't convert all occurrences of ae, oe, ue, ss into the corresponding special characters. E.g., Dauer != Da\"ur; wissen != wi{\ss}en; etc. I brewed a very little lex program "diac" which converts AG into DG using context information to figure out which ae/oe/ue/ss get converted. The output is TeX source style. There is also an easy way to invoke "diac" as a preprocessor for LaTeX, so you can write your .tex files in AG style but get .dvi output with flawless diacritics. Except for Masse/Ma{\ss}e. :-] Actually, I have very certainly not exhausted all the patterns that are recognizable, though the present program seems fine on some test data I have. Luckily, it's easy to add new patterns (and to know what patterns to add) in the event of future bloopers. This program isn't necessarily restricted to German. It should be easily modifiable to any other language that uses diacritics and has a popular or idiosyncratic Ascii encoding. (E.g., an Indian language in Ascii w.r.t. any standard representation in Roman diacritics.) Moreover, as you get the hang of the patterns used, you can devise your own patterns to get better decodings. --d