Path: utzoo!attcan!uunet!mcvax!ukc!dcl-cs!aber-cs!pcg From: pcg@aber-cs.UUCP (Piercarlo Grandi) Newsgroups: comp.lang.c Subject: Re: signed/unsigned char/short/int/long Message-ID: <371@aber-cs.UUCP> Date: 11 Dec 88 00:19:40 GMT References: <264@aber-cs.UUCP> <8982@smoke.BRL.MIL> <8983@smoke.BRL.MIL> <277@aber-cs.UUCP> <225@twwells.uucp> <330@aber-cs.UUCP> <9086@smoke.BRL.MIL> <347@aber-cs.UUCP> <839@quintus.UUCP> Reply-To: pcg@cs.aber.ac.uk (Piercarlo Grandi) Distribution: eunet,world Organization: CS Dept., University College of Wales, Aberystwyth, UK (Disclaimer: my statements are purely personal) Lines: 179 In article <839@quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes: In article <347@aber-cs.UUCP> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: >This is not >suprising, considering that C is a descendant of BCPL (whose single most >annoying feature is having to use putbyte() and getbyte() for string >manipulation, as it has just one length of integer). That hasn't been true of BCPL for a long time. BCPL has two subscripting operators: base!index index is a word offset, form addresses a word base%index index is a byte offset, form addresses a byte Handy syntactic sugaring for putbyte() and getbyte(), that I appreciated for a short while, as it was still fairly recent when I switched to C... (I did not use BCPL a lot, after all). But it was not there at the time C was derived from BCPL! I am persuaded that Ritchie & Co. evolved C from BCPL for essentially two reasons, to have different lengths of word recognized by the compiler, and to have a neater syntax. BCPL has evolved a little bit... Still, you cannot define a byte array in BCPL; this would go against the very fundamentals of the language. Richards wrote that the easy portability of BCPL was largely based on having the code generator deal with a single type. >In a sense C is a wonderfully equilibrated mix, BCPL with quite a good lot of >Algol68 thrown in, and this shows thru in things like some semantics >(BCPL-ish) of integer types, and their syntax (Algol68-ish). The semantics of C integral types resembles Algol 68 (which has e.g. int, short int, short short int, long int, long long int) rather than BCPL, which has only one type "word". To me that is syntax. The semantics is that functionally C types are still essentially "words", albeit of different lengths (to me, lengths do not change the semantics of operations, and thus do not really introduce new "types"). Unsigned was a significant departure, especially in that it was defined to obey the rules of modular arithmetic. The syntax of C constants resembles BCPL rather than Algol 68 (e.g. no general "radix" notation, characters as integer constants rather than char constants, \escapes -- BCPL uses *escapes, Algol 68 has no escapes at all). Indeed, indeed; exactly what I meant. Apparently BCPL, going into B, and then early C, remained quite BCPL-ish; on one one clearly "struct" was taken from Algol68, but the fact that members of structs were essentially named offsets, with no (in)visibility rules, was easily a way to transpose BCPL's "pointer!named.int.constant" into "pointer->field_name", which is arguably nicer than the literal translation "pointer[named_int_constant]". Of course there are a lot of other details... Enough for now of these. Eventually Bourne & Co. (I surmise) did bring a lot more Algol68 lore (actually, amusingly, Algol68*C* -- for Cambridge) into Bell Labs., and C and Unix. As a sign of this, I was always greatly amused that in the released V7 adb $a was described as "Algol68 stack backtrace", even if the Algol68 compiler (probably a derivative of Algol68C, an excellent piece of engineering) was never released to the general public... (at least not to me, unfortunately!). > introducing the signed keyword and related paraphernalia instead of > allowing "int char" (an existing unintentional "feature" of some > compilers, by the way) to do the trick, I will make an argument against this. "int" does *not* in general have the meaning "make it signed". Yes, Yes. But it could be construed to... Actually I would not like, as you have understood, to have it have that meaning. I would rather interpret "char int" mean "short short int" than "int char" mean "signed char"... For example, "int unsigned", if accepted, is not signed! Yes according to existing rules; but "unsigned int" (and "signed int"), are exactly what I am trying to make obsolete! In a sense you have spotted the weak point of my argument; if a declaration were to be built of a length modifier and a base type (both optional), then "unsigned int" would be illegal (two base types!), against existing common practice. It could however be declared obsolescent and allowed as a special case, which admittedly is ugly, but virtually painless. I would definitely expect that if "int char" or "char int" were accepted at all, they would be identical to "char" in every respect. Yes, with some caveats, in the dpANS framework. In my framework, char would be a modifier, and unsigned/int base types. If the base type were omitted by the programmer, any of the two base types could be defaulted by the implementation, as currently is done. If not, "char int" would have to be signed, and "char unsigned" not. What *would* have been consistent with C's intellectual ancestry, and *would* suggest signedness, would have been introducing "short short int" = "signed char" and "unsigned short short int" = "unsigned char". Yes, again, except that I'd rather have "short short unsigned" mean "unsigned char". I think that indeed one problem with Algol68 is that there is no notion of unsigned. Since (in C, at least), unsigned behaves differently from int, it ought to be regarded as a different base type to which apply the same length modifiers as int. But I'm quite sure that X3J11 considered this and rejected it for good reasons. Essentially that "short short" is superfluous, as "char" in practice is being used for that. In that I agree, after all C is not Algol68. As I have indicated, however, I'd rather dispose of Algol68 like length indicators, except as an obsolescent feature; instead of wasting a keyword on "signed", I'd rather waste it on "range" or whatever, and let the compiler figure the appropriate number of bits. As a more C-ish, and less radical alternative, I'd extend the bit field notation to ordinary declarations. Let me quote from a reply I sent (no, I am not yet like Prof. Dijkstra in quoting only my own works, diary and letters :->) to somebody making points similar to yours: But with the current scheme I find myself doing things like #ifdef pdp11 # define bit8 char # define bits16 int # define bits32 long #endif #ifdef vax # define bit8 char # define bits16 short # define bits32 int #endif (note use of #define and not typedef because I want to be able to say things like "bits8 unsigned") and then, as a consequence, typedef bits8 ascii; typedef bits16 procid; typedefs bits32 dollars; The first step is useless and circuitous, and less portable, as you have to have explicitly as many cases as you have machine types and compilers; I'd rather say: typedef unsigned ascii : 7; typedef int procid : 16; typedef int dollars : 32; THE END, FINALLY! Now for some meta-discourse. I thank you for your civil reply. I also have another reasons to thank you. Evidently I have not been able to communicate to Mr Wells and Mr. Gwin that I do know the existing language in the Classic C book by K&R, and the ones in dpANS C, even if I find it less brilliant :-), and even if has been a (now nearly fixed) moving target. Evidently I have not been able to make them understand that I was trying to show that with a little definitional legerdemain, for which there could be some justification in existing or old compiler bugs, or in looking at Classic C with a jaundiced, but historically justified, attitude, some potentially confusing, and needless, X3J11 decisions could have been avoided, and the Classic C syntax and pragmatics be made even simpler and more symmetric, at virtually no cost in breaking existing programs. A few people that have sent me msgs by email have penetrated my admittedly somewhat heavvvvvy prose, and have understood as much, whether agreeing (mostly) or disagreeing with me (like you). I thank you for posting a reply that demonstrates to our audience, and not to me alone, that somebody can understand the points I make, and address them, instead of confusing my inability to express myself in a way palatable to themselves with something else. -- Piercarlo "Peter" Grandi INET: pcg@cs.aber.ac.uk Sw.Eng. Group, Dept. of Computer Science UUCP: ...!mcvax!ukc!aber-cs!pcg UCW, Penglais, Aberystwyth, WALES SY23 3BZ (UK)