Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watmath!clyde!rutgers!ucla-cs!wales From: wales@ucla-cs.UUCP Newsgroups: comp.std.internat Subject: Re: Character representation Message-ID: <7780@shemp.UCLA.EDU> Date: Wed, 19-Aug-87 00:19:29 EDT Article-I.D.: shemp.7780 Posted: Wed Aug 19 00:19:29 1987 Date-Received: Fri, 21-Aug-87 04:52:01 EDT References: <15381@mordor.s1.gov> Sender: root@CS.UCLA.EDU Reply-To: wales@CS.UCLA.EDU (Rich Wales) Organization: UCLA Computer Science Department Lines: 44 In article <15381@mordor.s1.gov> pom@s1-under.UUCP () writes: >Besides I have VERY CONSTRUCTIVE (insiders info) FACT on cedillas, >umlauts, haceks, and other such ... [modifiers] namely : In all lan- >guages I know, there are many kinds, but ANY PARTICULAR LETTER either >has one - or it does not. That means that we need to reserve just 1 >bit (0.. unmodified) and (1.. modified). to take care of dozens of >languages. >To disprove my conjecture, name one language with Latin-based alphabet >and one letter in that alphabet, which admits more than one modifier. Good try, really, but there are several counterexamples: Czech. "U" can have an acute accent, or a small circle. Also, "E" can have either an acute accent or a "hacek" (V-like accent). French. "E" can have an acute, grave, or circumflex accent, or a diaeresis (two dots). "A" and "U" can have either a grave or a circumflex accent. "I" can have either a circumflex accent or a diaeresis. Hungarian. "O" and "U" can have a regular acute accent, a regular umlaut (two dots), or a "long" umlaut (two acute accents). Polish. "Z" can have an acute accent, or a single dot. Romanian. "A" can have a breve ("short" sign, like a small U) or a circumflex. Swedish. "A" can have either an umlaut, or a small circle. Vietnamese. There are several different kinds of accent marks used in this language to indicate tones (syllable pitch patterns), and as far as I'm aware, any of these accents may occur on any vowel. (And, yes, modern Vietnamese *does* use the Latin alphabet.) It may or may not be relevant, for purposes of this discussion, to note that some of the above languages treat the "modified" versions of their letters as completely distinct letters in their own right. -- Rich Wales // UCLA Computer Science Department // +1 213-825-5683 3531 Boelter Hall // Los Angeles, California 90024-1596 // USA wales@CS.UCLA.EDU ...!(ucbvax,rutgers)!ucla-cs!wales "Sir, there is a multilegged creature crawling on your shoulder."