Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!brl-adm!brl-smoke!gwyn From: gwyn@brl-smoke.ARPA (Doug Gwyn ) Newsgroups: comp.lang.c Subject: Re: sizeof(char) Message-ID: <5355@brl-smoke.ARPA> Date: Tue, 11-Nov-86 07:32:03 EST Article-I.D.: brl-smok.5355 Posted: Tue Nov 11 07:32:03 1986 Date-Received: Tue, 11-Nov-86 09:15:34 EST References: <4617@brl-smoke.ARPA> <657@dg_rtp.UUCP> Reply-To: gwyn@brl.arpa (Doug Gwyn (VLD/VMB) ) Organization: Ballistic Research Lab (BRL), APG, MD. Lines: 148 In article <1305@ttrdc.UUCP> levy@ttrdc.UUCP (Daniel R. Levy) writes: >This seems too simple. So, what have I missed? There are a couple of factors. First, if I know that accessing a bit (whether by macro or by language-supported data type) is going to actually load up a whole word, perform separate masking operations, then store the word back in memory, as opposed to a direct hardware access of the bit, I am likely to design my algorithms quite differently and explicitly handle words as well as bits in my bitmap code. Second, in order to support straightforward programming techniques, such as looping through arrays, incrementing pointers, etc., the data type has to be officially blessed as a basic or derived type by the compiler. I should perhaps remind everyone that I am discussing explicitly NON- portable bitmap programming, since I have NOT proposed that ALL C implementations directly support bit-sized data objects. For PORTABLE bitmap programming (assuming you are concerned about it), one would indeed have to assume the worst and be prepared to handle word-masking. In case people have forgotten, C is not only a language for portable application programming, but it is also (even foremost) a system implementation language. Nitty-gritty system-level programming often has to deal with specifics of the hardware architecture. Software portability is important (most of you should be aware by now that I have strong feelings about that), but concern for it should not be allowed to limit the options of people who have an actual requirement for using C in intrinsically "dirty" ways. ---------- Allow me to repeat: proposal X3J11/86-136 is actually intended to help solve the MULTI-BYTE CHARACTER PROBLEM (which DOES exist). Its possible ramifications for system-specific bitmap programming are really a side issue, although such considerations can help clarify exactly what implementation possibilities are opened up by the formal proposal. Note that I am careful to distinguish between a character, by which I mean an individually manipulable unit that represents a natural piece of text, and a (char), which is a basic C data object. I refer to individually addressable storage units as bytes, no matter how many bits they consist of. If you don't keep these distinctions in mind, you will NOT understand my proposal or explanations! If I were developing programs to run on an Imagen print station, I would very much prefer my compiler to support "Galactic ASCII" (16-bit data) as a basic data type, in fact as a (char). If I am developing DMD code, I very much prefer my compiler/hardware to directly support the individual bit as a basic data type/byte, but I also need characters separately; in fact GASCII would be ideal for the DMD. If I were developing a generic operating system for world-wide distribution, I would very much prefer my compiler to support individual text elements (characters; note that "letters" is too limited a term for this) as basic data types. All that my proposal does is to allow compiler implementers the FREEDOM to choose these trade-offs appropriately for the intended major application; it doesn't force any particular choice for character or byte basic data object sizes. (However, if one uses an inappropriate choice for the application, or if one doesn't have control over the compiler that will be used, then one HAS to resort to "lowest common denominator" assumptions in one's coding; this is also the current state of affairs. I really don't think insisting that a (char) must necessarily be an 8-bit byte, which is ALREADY FALSE for K&R and X3J11 C, will help this situation.) If you're worried about the possible impact of the proposal on your own code, perhaps I should reassure you: Of the approximately HALF-MILLION lines of C code that I maintain (mostly written by others, practically none of whom worried about these matters), not a single line is affected by my proposal so long as the compiler implementer continues to choose to make (char) and (short char) have the same size. If these data types were to have different sizes, then a few things would indeed break, as follows: use of sizeof"string_constant" instead of strlen("string_constant")+1 : occurs in about 10 places (I had to find all these once, since an older Gould compiler insisted that sizeof"string_constant"==sizeof(char *) .) coercing of other pointer types to (char *), doing address arithmetic, then coercing the pointer back: this is atrocious practice in the first place, and seldom occurs; I estimate at most 20 to 50 places would need to be fixed, by using (short char *) instead of (char *) (or better, by redesign of the code). specifically byte I/O routines, such as are required to meet predefined or machine-independent protocols: these occur in nearly 100 places, and most of them are written so that they make rather severe assumptions about the run-time environment, usually that getc/putc necessarily input/output precisely 8 bits at a time. It is simple to adapt these to the multi-byte (char) environment, such as by using getsc/putsc, but such pieces of code are necessarily implementation-dependent anyway and should always be checked when porting to significantly different environments. Actually, most of this code was developed for a 7-or-8 bit character, one character per (char), environment and POTENTIALLY needs a fair amount of rework for a more general character environment no matter WHAT approach is used. With my proposal, VERY LITTLE need be changed in such code, since text handling is already being done with the idea that (char) represents a single character (see my NOTE above!); with (long char) approaches, a SUBSTANTIAL amount of rework would be needed. To be fair, the amount of rework for (long char) can be reduced if one artificially constrains (long char)s so that neither byte is allowed to be zero except for the "null character" string terminator. Such a constraint is not at all necessary with my approach, for which a "null character" is precisely one that has 0 numeric value (without worrying about subfields), as in current K&R and X3J11 C. Note also that an artificial constraint also is known by the pejorative name of "kludge"; some of us have an aversion, not necessarily irrational, to kludges. I finally should remark that Guy Harris shows every sign of having made his mind up on the issue in advance of knowing what was proposed. The fact that he labeled my comments about implications of the strcoll() approach "bullshit" and proceeded to explain setlocale() to me indicate that he isn't LISTENING to what I'm saying; after all, I'm one of the people who decided how those facilities would be specified. Who does he think he is? The implication is that I must terribly stupid since I don't understand stuff I helped design. If instead one were to assume the more likely theory that I DO understand the significance of those facilities, then it would appear that Guy doesn't appreciate the point I was making. My guess is that he is so accustomed to responding to ignorant amateurs in this newsgroup that he automatically assumes when he doesn't immediately agree with someone they too must be "morons" and their remarks are consequently not worth the effort or courtesy of understanding before responding. Because I have taken a lot of trouble in choosing my exact wording, I also resent very much his apparent assumption that my words represent sloppy approximate concepts; just because many people write like that is no reason to assume that I do! Rather than be misled by other people's misconceptions, if you seriously want to evaluate my proposed solution to the multi-byte character problem and don't have access to X3J11/86-136, then refer the the latter part of my article <5310@brl-smoke.ARPA> (pretty much skipping the discussion of bitmap programming until after you understand the logical meaning of the formal proposal), rather than relying on the hash made of the proposal in some people's responses. Try assuming that I have NOT made some trivial blunder, then figure out what my point of view is that allows me to make the claims that I have been making. Once you understand precisely WHAT I have in mind, only THEN go back and examine counter-responses. (This is the approach that you should be taking to intellectual issues anyway.) I'm asking that you figure out this proposal from what I have presented, rather than spending lots of net time arguing over misconceptions. I'm fully prepared to admit that there are pros and cons to any alternative solution to the multi-byte character issue (or to bitmap programming issues, if that's more your concern), and that one might rationally disagree with my proposal because of different value weighting of the trade-offs. However, rational discussion first requires accurate communication and understanding of the ideas in question. I've done the best I can to explain them; now it's your turn to do the best you can to understand them. Otherwise, let's end the discussion now.