Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!rutgers!sri-spam!mordor!lll-crg!nike!oliveb!sun!guy From: guy@sun.uucp (Guy Harris) Newsgroups: net.lang.c Subject: Re: sizeof(char) Message-ID: <8907@sun.uucp> Date: Tue, 4-Nov-86 16:24:05 EST Article-I.D.: sun.8907 Posted: Tue Nov 4 16:24:05 1986 Date-Received: Wed, 5-Nov-86 05:35:19 EST References: <4617@brl-smoke.ARPA> <657@dg_rtp.UUCP> <55@cartan.Berkeley.EDU> <663@dg_rtp.UUCP> <5141@brl-smoke.ARPA> Organization: Sun Microsystems, Inc. Lines: 71 > X3J11 as it stands requires sizeof(char)==1. I have proposed that > this requirement be removed, to better support applications such as > Asian character sets and bitmap display programming. Along with > this, I proposed a new data type such that sizeof(short char)==1. > It turns out that the current draft proposed standard has to be > changed very little to support this distinction between character > objects (char) and smallest-addressable objects (short char). This > is much better, I think, than a proposal that introduced (long char) > for text characters. Why? If this is the AT&T proposal, it did *not* "introduce (long char) for text characters"; it introduced (long char) for *long* text characters. "char" is still to be used when processing text that does not include long (16-bit) characters. I believe the theory here was that requiring *all* programs that process text ("cat" doesn't count; it doesn't - or, at least, shouldn't - process text) to process them in 16-bit blocks might cut their performance to a degree that customers who would not use the ability to handle Kanji would find unacceptable. I have seen no data to confirm or disprove this. (Changing the meaning of "char" does not directly affect the support of "bitmap display programming" at all. It only affects applications that display things like Asian character sets on bitmap displays, but it doesn't affect them any differently than it affects applications that display them on "conventional" terminals that support those character sets.) > Unfortunately, much existing C code believes that "char" means "byte". > My proposal would allow implementors the freedom to decide whether > supporting this existing practice is more important than the benefits > of making a distinction between the two concepts. Both "short char"/"char" and "char"/"long char" make a distinction between the two concepts; one may have aesthetic objections with the way the latter scheme draws the distinction, but that's another matter. (Is 16 bits enough if you want to give every single character a code of its own?) > It is possible to write code that doesn't depend on sizeof(char)==1, > and some C programmers are already careful about this. It is possible to write *some* code so that it doesn't depend on sizeof(char)==1. Absent a data type one byte long, other code is difficult at best to write this way. > Transition to the more general scheme would occur gradually (if at all) for > existing C implementations, with only implementors of systems for > the Asian market and of bitmap display architectures initially taking > advantage of the opportunity to make these types different sizes. I think "if at all" is appropriate here. There are a *lot* of interfaces that think that "char" is a one-byte data type; e.g., "read", "write", etc.. I see no evidence that converting existing code and data structures to use "short char" would be anything other than highly disruptive. Adding "long char" would permit new programs to be written to support long characters, and permit existing programs to be rewritten to support them, without breaking existing programs; this indicates to me that it would make it much more likely that "long char" would be widely adopted and used than that "short char" would. I see no reason why a proposal that would, quite likely, lead to two different C-language environments existing in parallel for a long time to come is superior to one that would permit environments to add on the ability to handle long characters and thus would make it easier for them to do so and thus more likely that they would. (This is especially true when you consider that most of the programs in question would have to be changed quite a bit to support Asian languages *anyway*; just widening "char" to 16 bits, recompiling them, and linking them with a library with a brand new standard I/O, etc. would barely begin to make them support those languages.) -- Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com (or guy@sun.arpa)