Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watmath!clyde!rutgers!seismo!mcvax!enea!sommar From: sommar@enea.UUCP Newsgroups: comp.std.internat Subject: Re: Character representation Message-ID: <2201@enea.UUCP> Date: Wed, 19-Aug-87 16:58:35 EDT Article-I.D.: enea.2201 Posted: Wed Aug 19 16:58:35 1987 Date-Received: Sat, 22-Aug-87 09:04:07 EDT References: <2171@enea.UUCP> <709@maccs.UUCP> <2183@enea.UUCP> <719@maccs.UUCP> Reply-To: sommar@enea.UUCP(Erland Sommarskog) Followup-To: comp.std.internat Organization: ENEA DATA Svenska AB, Sweden Lines: 49 In a recent article gordan@maccs.UUCP (Gordan Palameta) writes: >t umlaut or q cedilla would probably be used very rarely, nor is it likely >that anyone would go to the trouble of designing a font to accomodate such >characters. Another cost of such generality would be that accents and other >marks would probably have to be indicated by escape sequences in conjunction >with the unmodified letter. This would make string-processing software more >complicated (and slower), and text would be longer. If you had something like the 8th bit meaning that the following byte is a modifier, this would quite moderately increase the length of the text and the string-processing time. This solution does however not solely address the problem that different languages have different collating sequences. >Not at all, just define a 256-byte lookup table in an include file, and modify >the code to > if (coll[c] >= FIRST_CHAR && coll[c] <= LAST_CHAR) >with very little loss of efficiency. To accomodate perverse languages like >Spanish and Polish which insist on two-letter combinations for sorting, Spanish and Polish aren't more perverse than English. Of course I know about look-tables. I have myself written a programme that uses a two-level look-up table for comparing words. (And the words are transcribed in three levels. You don't want the hyphen in a hyphenated word to be significant.) But to have that in every single programme that does string comparisons. No, thank you. It does increase the complexity and the readability of the code. It would be much more nice if "ch1 >= ch2" meant that ch1 comes before or at the same position as ch2 in alphabet we currently have chosen. (It's unclair what equality is when modified letter are involved. Probably you will need two kinds of equality.) >Never mind the French; what if things had turned out differently in 1588 >with the Armada, and the Spanish had invented computers? Or the Chinese? I just took the Frenchmen as an example, OK? No matter who had invented the computers; if their language also had had the dominating position that English have, that language would have set the standard for character representation with no other language in mind. I took French as example since they have plenty of for sorting non-significant modifiers. >Followups to alt.universes. If you find the subject that uninteresting, why did you ever write the article at all? -- Erland Sommarskog ENEA Data, Stockholm sommar@enea.UUCP