Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!brl-adm!brl-smoke!gwyn From: gwyn@brl-smoke.ARPA (Doug Gwyn ) Newsgroups: comp.lang.c Subject: Re: sizeof(char) Message-ID: <5359@brl-smoke.ARPA> Date: Wed, 12-Nov-86 21:04:19 EST Article-I.D.: brl-smok.5359 Posted: Wed Nov 12 21:04:19 1986 Date-Received: Wed, 12-Nov-86 23:55:53 EST References: <4617@brl-smoke.ARPA> <657@dg_rtp.UUCP> Reply-To: gwyn@brl.arpa (Doug Gwyn (VLD/VMB) ) Organization: Ballistic Research Lab (BRL), APG, MD. Lines: 61 In article <9181@sun.uucp> guy@sun.uucp (Guy Harris) writes: >If it is indeed the case that there is more than one way of sorting text in, >say, Oriental languages, then either 1) "setlocale" is a poor name, because >it takes into account more than just the locale, or 2) it is a poor routine, >because it doesn't take into account more than just the locale. The name is short for "set locale-specific information", which reflects the main motivation for the function. There were several suggestions for the name, but we couldn't find one that we liked better, other than contractions of "set environment", which had to be rejected for the obvious reason. Actually, it WAS intended that setlocale() indeed mean "change or query the program's entire LOCALE or portions thereof", where the term "locale" was to be defined in section 1.5. However, something appears to have gone awry in the process of making this last-minute addition to the draft proposed standard document, since there are two sentences in the description of setlocale (section 4.4.1.1) that say almost the same thing using different words, and section 1.5 defines "locale-specific behavior" but not "locale". The general term "locale" is intended in the context of X3J11 to refer to a complete, orthogonal set of selections of conventions for items that are allowed to affect program operation based on nationality, culture, or language. Thus "locale" is not synonymous with "location". By the way, one doesn't have to turn to oriental languages to find more than one way of sorting text. Even English has several different collating sequences, depending on the specific application. >The claim you made was that "strcoll() amounts to a declaration that there >IS a natural multibyte collating sequence for any single environment" is a >little hard to parse. I assume you mean that "by specifying that there >is such a routine, the proposers of strcoll() are declaring that there IS a >natural multibyte collating sequence for any single environment." Given >that "setlocale" exists, I fail to see how it declares this, unless >"environment" is defined so that an environment always specifies a single >collating sequence. In the latter case, the claim is true, but trivially so. I used "environment" rather than "locale" since the technical X3J11 meaning of the latter is not well known. The existence of a natural collating sequence for a locale is not at all obvious; one might question whether it is really true for languages that use ideographs for their printed representation, for example. >Fine. Are you prepared to admit that there *is* a non-trivial trade-off >involved in the "short char" proposal (i.e., that it is not a given that >few, if any, lines of *existing* code need change so that it can work >equally well in an one-storage-unit "char" and a two-storage-unit "char" >environment), and that some people might rationally disagree with your value >weighting of the changes needed to existing code to make it work in a >two-storage-unit "char" environment and to make it work in a "long char" >environment? I have been maintaining that very little existing code is affected: NONE on implementations that decide to make sizeof(char)==1, and almost none for the vast majority of applications code on implementations that decide to support multi-byte (char)s. I even gave examples of most typical code dependence on sizeof(char)==1. I can well believe that AT&T's STREAMS code would be heavily dependent on the constraint (in fact, I wonder whether it could even be made to work on a 20- or 36-bit word architecture, if it depends so much on the size of a (char)); however, I don't mind nearly so much making more work for kernel workers, network hackers, and other lower life forms as I do making more work for application developers. (As I said, different value weighting.)