Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site rtp47.UUCP Path: utzoo!watmath!clyde!bonnie!akgua!mcnc!rti-sel!rtp47!meissner From: meissner@rtp47.UUCP (Michael Meissner) Newsgroups: net.internat Subject: Re: character sets Message-ID: <214@rtp47.UUCP> Date: Fri, 11-Oct-85 13:39:51 EDT Article-I.D.: rtp47.214 Posted: Fri Oct 11 13:39:51 1985 Date-Received: Sun, 13-Oct-85 04:40:47 EDT References: <719@inset.UUCP> Reply-To: meissner@rtp47.UUCP (Michael Meissner) Organization: Data General, RTP, NC Lines: 28 In article <719@inset.UUCP> mikeb@inset.UUCP (Mike Banahan) writes: > >The first problem that strikes typical C programmers is how they should >represent characters outside the normal ASCII set. They then start thinking >about using the `top' bit to extend the range of usable characters up to 255. >Somebody throws in a suggestion that the Japanese will want around 7000 >(seven thousand) characters, so the next idea is to start using shift >sequences. > > ... > >But there are problems. First, characters aren't fixed length any more. >You should see what *that* does to C code. Fixed length arrays aren't >fixed in length any more, you can't index into them to find the nth >character, because if it's preceded by a shift code it will mean something >else. > I don't know much about all the ramifications, but I think not having fixed length characters would be horribly expensive. I think that the best solution would be a new character type, which can hold all of the glyphs (spelling?) that anybody (not just western europe & USA) needs to use. I would think that something on the order of 4 octet's (32 bits) should be able to hold all of the information, complete with font/size. I would think that the current ISO eight bit encoding for europe/USA would be used if the upper 3 octets were zero, and that it be easy to isolate font info via masking. Michael Meissner Data General