Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!seismo!husc6!cmcl2!brl-adm!brl-smoke!gwyn
From: gwyn@brl-smoke.ARPA (Doug Gwyn )
Newsgroups: comp.lang.c,comp.std.internat
Subject: Re: What is a byte
Message-ID: <6252@brl-smoke.ARPA>
Date: Sat, 8-Aug-87 08:40:58 EDT
Article-I.D.: brl-smok.6252
Posted: Sat Aug  8 08:40:58 1987
Date-Received: Sun, 9-Aug-87 11:07:13 EDT
References: <218@astra.necisa.oz> <142700010@tiger.UUCP>
Reply-To: gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>)
Organization: Ballistic Research Lab (BRL), APG, MD.
Lines: 35
Xref: mnetor comp.lang.c:3547 comp.std.internat:83

In article <899@haddock.ISC.COM> karl@haddock.ima.isc.com (Karl Heuer) writes:
>The problem with your proposal is that it would break existing code that
>assumes sizeof(char) == 1.

Of course, such code is already broken in the international environment.
In fact, in an 8-bit (char) implementation, such code would continue to
work.  In other words, something has to give for internationalized
implementations; the question is what?  With my proposal,
sizeof(short char)==1, so there could be a transition period during
which implementations would make sizeof(char)==sizeof(short char) until
application source has been cleaned up.  (Some developers have been
careful to not rely on sizeof(char)==1 all along, anticipating the day
when this assumption may have to be changed.)

>If I were a Japanese user, using a VAX, and I was told that, because
>Japanese characters require more than 8 bits, and because (char) is the
>obvious datatype for characters, and because C requires that nothing be
>smaller than (char), my compiler couldn't address individual bytes, then I
>think I'd start looking for a new vendor or a new programming language.

That's why something has to be done.

As I reported recently, X3J11 has agreed in principle with Bill Plauger's
proposal for a typedef letter_t and a few conversion-oriented functions,
but NO library for letter_t analogous to the standard str*() routines.
This necessitates source-level kludgery for any application for which
portability into a multi-byte character environment is a possibility.
I don't like that very much, but since I'm not expecting to sell software
products to the Japanese I'll go along with it if the vendors think it
will fly.  This seems to be another case of not wanting to do things
technically correctly if that would require a radical change to previous
practice.  That's a legitimate concern, of course.

If *I* were a Japanese programmer, I think I'd resent being treated as
a second-class citizen by the programming language.