Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 6/24/83 (MC840302); site mcvax.UUCP Path: utzoo!linus!philabs!cmcl2!seismo!mcvax!aeb From: aeb@mcvax.UUCP (Andries Brouwer) Newsgroups: net.nlang,net.text Subject: Re: Re: troff special chars - naming them (about diacritical marks) Message-ID: <775@mcvax.UUCP> Date: Thu, 25-Jul-85 23:50:41 EDT Article-I.D.: mcvax.775 Posted: Thu Jul 25 23:50:41 1985 Date-Received: Tue, 30-Jul-85 06:24:10 EDT References: <1065@diku.UUCP> <763@mcvax.UUCP> <1070@diku.UUCP> Reply-To: aeb@mcvax.UUCP (Andries Brouwer) Organization: CWI, Amsterdam Lines: 135 Xref: linus net.nlang:3132 net.text:457 Last time I just mentioned a few accents that occurred to me while writing - let me now give a more detailed overview of what accents exist. 1. Accents on top - Acute accent (') occurs on top of almost anything; many languages have 'a 'e 'i 'o 'u ; Icelandic also 'y ; Slovak also 'y 'r 'l ; Polish also 'c 'n 's 'z ; Latvian has a character that is sometimes printed as 'g (see below); etc. Note that the ' on 'a has not the same slope as the ' on 'i . - Grave accent (`) occurs in many languages in `a `e `i `o `u ; Slovene `r - Circumflex (^) occurs in many languages in ^a ^e ^i ^o ^u ; Esperanto has ^c ^g ^h ^j ^s ; accented Latvian has ^l . - Trema/Diaeresis/Umlaut (::/") occurs as umlaut in many languages in "a "o "u (e.g. German, Slovak, Finnish, Swedish, Turkish, Hungarian); as trema in ::a ::e ::i ::o ::u . - Hacek (h\'a\vcek) (v) occurs in many Slavic languages; Czech has ve vc vn vs vr vz ; Slovak also vD ; Esperanto vu . In transcriptions one meets other letters with hacek, e.g. Armenian vj . - When the letter that should get the hacek is tall, then it gets a comma at the upper right instead: Czech has ,d ,t ; Slovak also ,l . - Dot above (:) occurs in various places; the most obvious ones are :z in Polish and :e in Lithuanian, but I found it also e.g. as :n in the African language Bamoum. - Macron (overline) (-) occurs as -a -e -i in Latvian, as -u in Lithuanian and is otherwise generally used to denote the length of vowels. - Corona (circle above) (o) is found in Scandinavian oa and Czech ou . - Tilde (~) is found in Spanish ~n , Portuguese ~a ~o and otherwise e.g. in accented Baltic languages: ~a ~e ~i ~o ~y ~m ~n ~l ~r ~.e . - Breve (half circle above) (U) is found in Rumanian Ua , Turkish Ug , Vietnamese Ua and is otherwise generally used to denote short vowels. - Double acute ('') is found in Hungarian ''o and ''u . - High tone mark (question mark without dot) (?) is found in Vietnamese ?a ?o ?u . - In Latvian the palatalized sounds have a comma below, as we shall see, but in ,g there is no room for the , to go below, and one finds it on top instead. I have met three variations: 'g (acute accent), ,g (high centered comma) and I,g (high centered inverted comma). Sometimes the high centered inverted comma is met in other places; I have seen I,k and I,t in transliterated Armenian and I,p in Sorbian. - In old Croatic texts one finds the double grave accent (``) as in ``a ``e ``i ``r . 2. Accents below - Cedille (,) or left hook occurs in French ,c ; in Turkish ,s ; in Rumanian ,s ,t ; in Latvian ,k ,l ,n ,r (and ,K ,L ,N ,R ,G - for ,g see above). These hooks do not always resemble a comma. - Rude (L) or right hook occurs in Polish La Le ; in Thai and old Norse Lo ; in Lithuanian La Li Lu ; in old Latvian Le Lk . These hooks start right from the center, sometime almost at the center, sometimes at the lower right hand corner. - Dot below (.) occurs in Vietnamese .a .e .o ; in transliterations from Arabic or Sanskrit one meets .d .t .s .r .h etc. - Corona below (0) occurs in transliterations, often to indicate that a sonorant has syllabic value: 0m 0n 0l 0r 0s . - Breve below (u) occurs in transliteration of Sanskrit and Hittite uh . - Double dot below (..) seems to occur in transliterated Urdu ..t . - Vertical bar below (|) seems to occur in Yoruba |o . - Circumflex below (A) seems to occur in Bamileki and Venda Ae . 3. Accents on more than one letter simultaneously - An arc on top may join two letters, like in the transliteration of the Russian "relected R" as IU{ia} . - In Tagalog occurs a tilde on the ng digraph: ~{ng} . - Underline (_) is often used to indicate that two letters transliterate one sound, e.g. in various Indian languages _{kh} . - Similarly the double underline (=) is sometimes used when the combination of two letters stands must represent two distinct sounds, e.g. Urdu ={gh} . (See also the ligature above.) Note that I do not propose a naming scheme for accented symbols here - the chosen denotations are purely ad hoc. Simple schemes as discussed earlier almost always work, but fail when one letter carries several diacritical marks. In Vietnamese one finds letters with acute and circumflex side by side (so that it looks like a rotated 'less than or equals' sign): {'^}a {'^}e {'^}o and towers like '^o ^a. ?Ua ~^e (read from top to bottom). In Lithuanian one meets ~.e ~u, {.'}e '-u etc. Clearly, when symbols can have three or more accents in various mutual positions then some nontrivial grammar is needed to describe the situation. 4. Special symbols Various ligatures are conventionally treated as a single symbol. One has Dutch ij , German ss (or sz), French oe and Scandinavian (and Latin) ae . Turkish has dotless i (.i). Icelandic has the thorn (bp) or (th). Some symbols with a crossbar are Polish /l and /L ; Scandinavian /o and /O ; Vietnamese and Yugoslavian and Icelandic -d and -D ; Icelandic +d (eth). Well, this is what I have found so far. The places where I said "seems to occur" the information is quoted from an old draft version of ISO standard ISO 5426 (dated 1975-07-10). I would be thankful if people mailed me their additions and corrections.