Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!lll-crg!rutgers!brl-adm!brl-smoke!gwyn From: gwyn@brl-smoke.ARPA (Doug Gwyn ) Newsgroups: net.lang.c Subject: Re: sizeof(char) Message-ID: <5310@brl-smoke.ARPA> Date: Sat, 8-Nov-86 09:02:04 EST Article-I.D.: brl-smok.5310 Posted: Sat Nov 8 09:02:04 1986 Date-Received: Sun, 9-Nov-86 03:56:06 EST References: <4617@brl-smoke.ARPA> <657@dg_rtp.UUCP> Reply-To: gwyn@brl.arpa (Doug Gwyn (VLD/VMB) ) Organization: Ballistic Research Lab (BRL), APG, MD. Lines: 136 Guy is still missing my point about bitmap display programming; I have NOT been arguing for a GUARANTEED PORTABLE way to handle individual bits, but rather for the ability to do so directly in real C on specific machines/implementations WITH THE FACILITY: typedef short char Pixel; /* one bit for B&W displays */ /* fancy color frame buffers wouldn't use (short char) for this, but an inexpensive "home" model might */ typedef struct { short x, y; } Point; typedef struct { Point origin, corner; } Rectangle; typedef struct { Pixel *base; /* NOT (Word *) */ unsigned width; /* in Bits, not Words */ Rectangle rect; /* obscured-layer chain really goes here */ } Bitmap; /* does this look familiar? */ Direct use of Pixel pointers/arrays tremendously simplifies coding for such applications as "dmdp", where one has to pick up typically six bits at a time from a rectangle for each printer byte being assembled (sometimes none of the six bits are in the same "word", no matter how bits may have been clumped into words by the architect). Now, MC68000 and WE32000 architectures do not support this (except for (short char)s that are multi-bit pixels). But I definitely want the next generation of desktop processors to support bit addressing. I am fully aware that programming at this level of detail is non-portable, but portable graphics programming SUCKS, particularly at the interactive human interface level. Programmers who try that are doing their users a disservice. I say this from the perspective of one who is considered almost obsessively concerned with software portability and who has been the chief designer of spiffy commercial graphic systems (and who currently programs DMDs and REAL frame buffers, not Suns). I'm well aware of the use of packed-bit access macros, thank you. That is exactly what I want to get away from! The BIT is the basic unit of information, not the "byte", and there is nothing particularly sacred about the number 8, either. I agree that if you want to write PORTABLE bit-accessing code, you'll have to use macros or functions, since SOME machines/implementations will not directly support one-bit data objects. That wasn't my concern. Due to all the confusion, I'm recapitulating my proposal briefly: ESSENTIAL: (1) New type: (short char), signedness as for (char). (2) sizeof(short char) == 1. (3) sizeof(char) >= sizeof(short char). (4) Clean up wording slightly to improve the byte (storage cell) vs. character distinction. RECOMMENDED: (5) Fix character \-escapes so that larger numeric values are permitted in character/string constants on implementations where that is needed. The current 9/12 bit limit is a botch anyway. (6) Text streams read/write/seek (char)s, and binary streams read/write/seek (short char)s. This requires addition of fgetsc(), fputsc(), which are routines I think most system programmers have already invented under names like get_byte(). (7) Add `b' size modifier for fscanf(). I've previously pointed out that this has very little impact on most existing code, although I do know of exceptions. (Actually, until the code is ported to a sizeof(short char) != sizeof(char) environment, it wouldn't break in this regard. That port is likely to be a painful one in any case, since it would probably be to a multi-byte character environment, and SOMEthing would have to be done anyway. The changes necessary to accommodate this are generally fewer and simpler under my proposal than under a (long char)/lstrcpy() approach.) As to whether I think that mapping to/from 16-bit (char) would be done by the I/O support system rather than the application code, my answer is: Absolutely! That's where it belongs. (AT&T has said this too, on at least one occasion, taking it even so far as to suggest that the device driver should be doing this. I assume they meant a STREAMS module.) I won't bother responding in detail on other points, such as use of reasonable default "DP shop" collating sequences analogous to ASCII without having to pack/unpack multi-byte strings. (Yes, it's true that machine collating sequence isn't always appropriate -- but does that mean that one never encounters computer output that IS ordered by internal collating sequence? Also note that strcoll() amounts to a declaration that there IS a natural multibyte collating sequence for any single environment.) Instead I will simply assure you that I have indeed thought about all those things (and more), have read the literature, have talked with people working on internationalization, and have even been in internationalization working groups. I spent the seven hours driving back from the Raleigh X3J11 meeting analyzing why people were finding these issues so complex, and discovered that much of it was due to the unquestioned assumption that "16-bit" text had to be considered as made of individual 8-bit (char)s. If one starts to write out a BNF grammar for what text IS, it becomes obvious very quickly that that is an unnatural constraint. Before glibly dismissing this as not well thought out, give it a genuine try and see what it is like for actual programming; then try ANY alternative approach and see how IT works in practice. If you prefer, don't consider my proposal as a panacea for such issues, but rather as a simple extension that permits some implementers to choose comparatively straightforward solutions while leaving all others no worse off than before (proof: if one were to decide to make sizeof(char) == sizeof(short char), that is precisely where we are now.) What I DON'T want to see is a klutzy solution FORCED on all implementers, which is what standardizing a bunch of simultaneous (long char) and (char) string routines (lstrcpy(), etc.) would amount to. If vendors think it is necessary to take the (long char) approach, the door is still open for them to do so under my proposal (without X3J11's blessing), but vendors who really don't care about 16-bit chars (yes, there are vendors like that!) are not forced to provide that extra baggage in their libraries and documentation. The fact that more future CPU architectures may support tiny data types directly in standard C than at present is an extra benefit from my approach to the "multi-byte character" problem; it wasn't my original motivation, but I'm happy that it turned out that way. (You can bet that (short char) would be heavily used for Boolean arrays, for example, if my proposal makes it into the standard; device-specific bitmap display programming is by no means the only application that could benefit from availability of a shorter type. I've seen many people #define TINY for nybble-sized quantities, usually having to use a larger size (e.g., (char)) than they really wanted.) From the resistance he's been putting up, I doubt that I will convert Guy to my point of view, and I'm fairly sure that many people who have already settled on some strategy to address the multi-byte character issue are not eager to back out the work they've already put into it. However, since I've shown that a clean conceptual model for such text IS workable, there's no excuse for continued claims that explicit byte-packing and unpacking is the only way to go.