Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!lll-crg!lll-lcc!pyramid!decwrl!sun!guy From: guy@sun.uucp (Guy Harris) Newsgroups: comp.lang.c Subject: Re: sizeof(char) Message-ID: <9181@sun.uucp> Date: Wed, 12-Nov-86 06:38:51 EST Article-I.D.: sun.9181 Posted: Wed Nov 12 06:38:51 1986 Date-Received: Wed, 12-Nov-86 22:08:58 EST References: <4617@brl-smoke.ARPA> <657@dg_rtp.UUCP> Organization: Sun Microsystems, Inc. Lines: 89 > If these data types were to have different sizes, then a few things > would indeed break, as follows: > ... declarations of pointers to fundamental storage units as "char *", "unsigned char *", etc. rather than as "storage_unit *". Yes, they *can* be changed. Programs that use "char" *can* also be changed to use "long char". The question is "which is more work"? I am still not convinced that changing those declarations is less work than changing code that handles characters, especially since the latter code will have to be changed anyway in many cases to make it work with non-ASCII character sets. > With my proposal, VERY LITTLE need be changed in such code, > since text handling is already being done with the idea that (char) > represents a single character (see my NOTE above!); I'm not talking about code that processes characters; I'm talking about code that processes storage units. Maybe I'm biased, since I've spent a fair bit of time recently working with streams module code, where you do a *lot* of stuffing of data structures into and extracting data structures from arrays of storage units, but I'd rather not have to worry about that code, since it is not the code I'd be changing to internationalize a system. > with (long char) approaches, a SUBSTANTIAL amount of rework would be > needed. To be fair, the amount of rework for (long char) can be reduced > if one artificially constrains (long char)s so that neither byte is > allowed to be zero except for the "null character" string terminator. How much rework is needed to change "strcpy" to "lstrcpy"? Note that, with proper ANSI C declarations in , changing the string types from types derived from "char" to types derived from "long char" will cause the compiler to flag many of these anyway. > I finally should remark that Guy Harris shows every sign of having made > his mind up on the issue in advance of knowing what was proposed. Oh, good grief. The only thing I've "made up my mind on" is that the claim that there isn't much work involved in making all C code work correctly if "char" is not the fundamental unit of storage. > The fact that he labeled my comments about implications of the strcoll() > approach "bullshit" and proceeded to explain setlocale() to me indicate > that he isn't LISTENING to what I'm saying; after all, I'm one of the > people who decided how those facilities would be specified. If it is indeed the case that there is more than one way of sorting text in, say, Oriental languages, then either 1) "setlocale" is a poor name, because it takes into account more than just the locale, or 2) it is a poor routine, because it doesn't take into account more than just the locale. I notice in my copy of "Inside Macintosh" that they *do* support more than one collating sequence for their extended character set for the benefit of German (the vowels equipped with diareses sort in the same place as the unadorned vowels in the primary ordering sequence for non-German languages, but sort in the same place as the ligature composed of that vowel and the letter "e" in the primary ordering sequence for German). Now I am not willing to rule out the possibility that a site might want to have both documents in French and in German. As such, code that would sort lists of names in these documents would have to set the "locale" based on an indication of the language the document is in (not from, say, an environment variable). The claim you made was that "strcoll() amounts to a declaration that there IS a natural multibyte collating sequence for any single environment" is a little hard to parse. I assume you mean that "by specifying that there is such a routine, the proposers of strcoll() are declaring that there IS a natural multibyte collating sequence for any single environment." Given that "setlocale" exists, I fail to see how it declares this, unless "environment" is defined so that an environment always specifies a single collating sequence. In the latter case, the claim is true, but trivially so. > I'm fully prepared to admit that there are pros and cons to any alternative > solution to the multi-byte character issue (or to bitmap programming > issues, if that's more your concern), and that one might rationally > disagree with my proposal because of different value weighting of the > trade-offs. Fine. Are you prepared to admit that there *is* a non-trivial trade-off involved in the "short char" proposal (i.e., that it is not a given that few, if any, lines of *existing* code need change so that it can work equally well in an one-storage-unit "char" and a two-storage-unit "char" environment), and that some people might rationally disagree with your value weighting of the changes needed to existing code to make it work in a two-storage-unit "char" environment and to make it work in a "long char" environment? -- Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com (or guy@sun.arpa)