Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site rtp47.UUCP
Path: utzoo!watmath!clyde!bonnie!akgua!mcnc!rti-sel!rtp47!meissner
From: meissner@rtp47.UUCP (Michael Meissner)
Newsgroups: net.internat
Subject: Re: character sets
Message-ID: <214@rtp47.UUCP>
Date: Fri, 11-Oct-85 13:39:51 EDT
Article-I.D.: rtp47.214
Posted: Fri Oct 11 13:39:51 1985
Date-Received: Sun, 13-Oct-85 04:40:47 EDT
References: <719@inset.UUCP>
Reply-To: meissner@rtp47.UUCP (Michael Meissner)
Organization: Data General, RTP, NC
Lines: 28

In article <719@inset.UUCP> mikeb@inset.UUCP (Mike Banahan) writes:
>
>The first problem that strikes typical C programmers is how they should
>represent characters outside the normal ASCII set. They then start thinking
>about using the `top' bit to extend the range of usable characters up to 255.
>Somebody throws in a suggestion that the Japanese will want around 7000
>(seven thousand) characters, so the next idea is to start using shift
>sequences.
>
>	...
>
>But there are problems. First, characters aren't fixed length any more.
>You should see what *that* does to C code. Fixed length arrays aren't
>fixed in length any more, you can't index into them to find the nth
>character, because if it's preceded by a shift code it will mean something
>else.
>

I don't know much about all the ramifications, but I think not having fixed
length characters would be horribly expensive.  I think that the best solution
would be a new character type, which can hold all of the glyphs (spelling?)
that anybody (not just western europe & USA) needs to use.  I would think
that something on the order of 4 octet's (32 bits) should be able to hold
all of the information, complete with font/size.  I would think that the
current ISO eight bit encoding for europe/USA would be used if the upper 3
octets were zero, and that it be easy to isolate font info via masking.

	Michael Meissner
	Data General