Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!cmcl2!husc6!mit-eddie!ll-xn!nike!lll-crg!lll-lcc!pyramid!oliveb!sun!guy From: guy@sun.uucp (Guy Harris) Newsgroups: net.lang.c Subject: Re: sizeof(char) Message-ID: <9053@sun.uucp> Date: Fri, 7-Nov-86 19:21:56 EST Article-I.D.: sun.9053 Posted: Fri Nov 7 19:21:56 1986 Date-Received: Sat, 8-Nov-86 18:39:21 EST References: <4617@brl-smoke.ARPA> <657@dg_rtp.UUCP> Organization: Sun Microsystems, Inc. Lines: 121 > Guy missed the meaning of my reference to bitmap display programming. > What I really care about in this context is support for direct bit > addressing. I am not at all convinced that anybody *should* care about this, at least from the standpoint of bitmap display programming. If a vendor permits you to bang bits on a display, they should provide you with routines to do this; frame buffers are not all the same, and code that works well on one display may not work well at all on another. Furthermore, some hardware may do some bit-banging operations for you; if you approach the display at the right level of abstraction, this can be done transparently, but not if you just write into a bit array. Furthermore, it's not clear that displays should be programmed at the bit-array level anyway; James Gosling and David Rosenthal have made what I consider a very good case against doing this (and no, I don't consider it a good case just because I work at Sun and we're trying to push NeWS). > I know for a fact that one reason we don't HAVE this on some current > architectures is the lack of access to the facility from > high-level languages. If that is the case, then the architect made a mistake. If it's really important, they can extend the language. Yes, this means a non-standard extension; however, the only way to get it to be a standard extension is to get *every* vendor to adopt it, regardless of whether they support bit addressing or not. In the case of C, this means longer-than-32-bit "void *" on lots of *existing* machines; I don't think the chances of this happening are very good at all. > I would like it to be POSSIBLE for some designer of an architecture > likely to be used for bit-mapped systems to decide to make bits directly > addressable. It is ALREADY possible to do this. The architect merely has to avoid thinking "if I can't get at this feature from unextended ANSI C, I shouldn't put it in." The chances are very slim indeed that there will be a standard way to do bit addressing in ANSI C, since this would require ANSI C to mandate that all implementations support it, and would require ANSI C to be rather more different from current C implementations that most vendors would like. > The idea of a "character" is that of an individually manipulable > primitive unit of text. As I've already pointed out, it is quite possible that there may be more than one such notion on a system. > However, in X3J11 practically everything that now refers to (char) > arrays is designed principally for text application, while practically > everything that refers to arbitrary storage uses (void *), not (char *). However, you're now introducing a *third* type; when you are dealing with arbitrary storage, sometimes you use "void *" as a pointer to arbitrary storage and sometimes you use "short char" as an element of arbitrary storage. > In a good implementation using my (char)/(short char) distinction, it > would be POSSIBLE to maintain a reasonable default collating sequence > for (char)s so that a kludge like strcoll() would not normally be > necessary.) This is simply not true, unless the "normally" here is being used as an escape clause to dismiss many natural languages as abnormal. Some languages do *not* sort words with a character-by-character comparison (e.g., German). One *might* give ligatures like "SS" "char" codes of their own - but you'd have to deal with existing documents with two "S"es in them, and you'd either have to convert them "on the fly" in standard I/O (in which case you'd have to have standard I/O know what language the file was in) or convert them *en bloc* when you brought the document over from a system with 8-bit "char"s. (Oh, yes, you'd still have to have standard I/O handle 8-bit and 16-bit "char"s, and conversion between them, unless you propose to make this new whizzy machine require text file conversion when you bring files from or send files to machines with boring obsolete old 8-bit "char"s.) Furthermore, I don't know how you sort words in Oriental languages, although I remember people saying there *is* no unique way of sorting them. > Using (long char) for genuine text characters would conflict with > existing definitions for text-oriented functions, which is the main > reason I decided that (char) is STILL the proper type for text units. If you're going to internationalize an existing program, changing it to use "lstrcpy" instead of "strcpy" is the least of your worries. I see no problem whatsoever with having the existing text-oriented functions handle 8-bit "char"s. Furthermore, since not every implementation that supports large character sets is going to adopt 16-bit "char"s, you're going to need two sets of text-oriented functions in the specification anyway. > The trade-off is between more compact storage (as in AT&T's approach) > requiring kludgery to handle individual textual units, versus a clean, > simple model of characters and storage cells that supports uncomplicated, > straightforward programming. What is this "kludgery"? You need two classes of string manipulation routines. Big Deal. You need to convert some encoded representation in a file to a 16-bit-character representation when you read the file, and convert it back when you write it back. Big Deal. This would presumably be handled by library routines. If you're going to read existing text files without requireing them to be blessed by a conversion utility, you'll have to do that in your scheme as well. You need to remember to properly declare "char" and "long char" variables, and arrays and pointers to same. Big Deal. I am not convinced that the "char"/"long char" scheme is significantly less "clean", "simple", "uncomplicated", or "straightforward" than the "short char"/"char" scheme. > While it is POSSIBLE to run into problems, such as in using the > result of strlen() as the length of a memcpy() operation, these > don't arise so often that it is hopeless to make the transition. Sigh. No, it isn't necessarily HOPELESS; however, you have not provided ANY evidence that the various problems caused by changing the meaning of "char" would be preferable to any disruption to the "clean" models caused by adding "long char". (Frankly, I'd rather keep track of two types of string copy routines and character types than keep track of all the *existing* code that would have to have "char"s changed to "short char".) -- Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com (or guy@sun.arpa)