Xref: utzoo comp.lang.c:14694 comp.lang.c++:2240 Path: utzoo!attcan!uunet!mcvax!ukc!dcl-cs!aber-cs!pcg From: pcg@aber-cs.UUCP (Piercarlo Grandi) Newsgroups: comp.lang.c,comp.lang.c++ Subject: Re: signed/unsigned char/short/int/long [was: #defines with parameters] Summary: "Get with it!" -- Surely a quote from dpANS C ;->. Message-ID: <375@aber-cs.UUCP> Date: 11 Dec 88 19:27:40 GMT References: <264@aber-cs.UUCP> <8982@smoke.BRL.MIL> <8983@smoke.BRL.MIL> <277@aber-cs.UUCP> <225@twwells.uucp> <330@aber-cs.UUCP> <244@twwells.uucp> Reply-To: pcg@cs.aber.ac.uk (Piercarlo Grandi) Distribution: eunet,world Organization: CS Dept., University College of Wales, Aberystwyth, UK (Disclaimer: my statements are purely personal) Lines: 186 I realize that in my crudeness and brutality there is no hope for me to achieve the extremely rarified levels of wisdom and learning of certain people endowed with a quick grasp of issues and gentlemany manners of debate. I therefore appeal (bowing my head, palms joined :->) to higher authority. Let me quote and summarize from one such easily recognizable higher authority, and repeat my own contentions (if it is boring for you, think how it is for me): ----------------------------------------------------------------------------- # 4. What's in a name [ .... ] # Objects declared as characters ("char") are large enough to store any # member of the implementation's character set, and if a genuine character # from that character set is stored ina character variable, its value is # equivalent to the integer code of that character. Other quantities may be # stored in a character variable, but the implementation is machine # dependent. character type == an integer type of sufficient length, whether "unsigned" or "int" is up to the implementation. # Up to three sizes of integer, declared "short int" "int", and "long int" # are available. [ .... ] integer type == any one of the three lengths of "int", not just "int". # Unsigned integers, declared "unsigned", obey the laws of arithmetic # modulo "2^n", where "n" is the number of bits in the representation. (on the # PDP-11, unsigned long quantitied are not supported). unsigned integer type == "unsigned" integer of all lengths, except of the PDP-11. Semantics are different from thsoe of integer types, as they obey the rules of modular, not algebraic, arithmetic. # [ .... ] Because objects of the foregoing types can be usefully interpreted # as numbers, the will be referred to as "arithmetic" types. Types "char" and # "int" of all sizes will be collectively called "integral" types. [ .... ] character type == "char", some large enough integer or unsigned integer type; unsigned integer type == "unsigned" of all lengths; integer type == "int" of all lengths (occasionally includes also "unsigned"s); integral type == all three of them. arithmatic type == integral types plus all lengths of "float". # 6.1 Characters and integers # A character or a short integer may be used whenever an integer is used. # In all cases the value is converted to an integer. There is no behavioural difference between char, short and other lengths of "int", but for their range. # Conversion of a shorter integer to a longer always involves sign # extension; integers are signed quantities. Integer types involve sign extension, by contrast with unsigned integer types. # Whether or not sign extension occurs for characters is machine dependent, # [ .... ]. Whether or not "char" is an integer or unsigned integer type is not prescribed. # [ .... ] When a longer integer is converted to a shorter or to a "char", # it is truncated on the left; excess bits are simply discarded. There is no behavioural difference between "char" and "short", or other lengths, except their size. # 6.5 Unsigned # Whenever an unsigned integer and a plain integer are combined, the # plain integer is converted to unsigned and the result is unsigned. # The value is the least unsigned integer congruent to the signed # integer (module "2^wordsize"). [ .... ] When an unsigned integer is # converted to "long", the value of the result is the same numerically # as that of the unsigned integer. [ .... ] The rules for conversions involving unsigned integers are different from those for integers. # 7. Expressions [ .... ] # The handling of overflow and divide check is expression evaluation is # machine dependent. [ .... ] Note insofar overflow is concerned this only applies to integer types, as unsigned integer types cannot overflow by definition. In other words, exceeding the range of a length of "int" is not well defined, while exceeding the range of a length of "unsigned" is. Another case where there are behavioural differences between unsigned integer and integer types. # 7.2 Unary operators # [ .... ] The result of the unary "-" operator is the negative of its # operand. The usual arithmetic conversions are performed. The negative of # an "unsigned" quantity is computed by subtracting its value from "2^n", # where "n" is is the number of bits in an "int". [ .... ] Another case where there are behavioural differences between unsigned integer and integer types. # 7.5 Shift operators # [ .... ] The right shift is guaranteed to be logical (0 fill) if "E1" # is "unsigned"; otherwise it may be (and is, on the PDP-11), arithmetic # (fill by a copy of the sign bit). Another case where there are behavioural differences between unsigned integer and integer types. # 8.2 Type specifiers # [ .... ] The words "long", "short" and "unsigned" may be thought of as # adjectives; the following combinations are acceptable: [ .... ] Here lies the crux of the matter. Throughout it is repeatedly and explicitly stated that unsigned integer types behave differently from integer types, and that the character type does not behave differently from a sufficiently long/short unsigned integer or integer type. Given this and the quoted phrase, it is apparent in hindsight that syntax and semantics are incomplete, as there is no way to ensure the signedness of a "char" (a similar problem exists with bit fields), and that syntax does not properly reflect semantics. dpANS C addresses the first point only, adding the "signed" keyword that can thought of as another adjective and adding several cases to the table of acceptable combinations. My contentions (for the last time!) are that [1] this is not necessary, as it is more natural to drop the pretense that "char" is a type distinct from "int", and instead adopt the notion that "char" is like "short", an adjective that modifies the length of its base type; [2] it does not resolve the issue of making clear that "unsigned" is semantically different from "int", while the various lengths of either type are, but for the different ranges, semantically equivalent among themselves, and this distinction is important; [3] both points can be economically addressed by redefining as integral types the class of all integer and unsigned types, as integer types the various lengths of "int", as unsigned types the various lengths of "unsigned", and as length adjectives/modifiers the keywords "char", "short", "long"; when the adjective is omitted, the base type has the length of "short" or "long", depending on the implementation; when the base type type is omitted, "int" is presumed, except for length "char", where the choice is implementation dependent. [4] the proposed rationalization, provided that "unsigned int" is made as a special case equivalent to "unsigned", is backward compatible; [5] because of a easily made "mistake", some compilers, in the past or now, did not/do complain when the rationalized syntax was/is used, and this could be easily blessed instead of eradicated; [6] if it is felt desirable to substantially modify the declaration of "int" or "unsigned" types, a new keyword could be introduced for range definition, or the syntax for bit fields could be allowed outside ------------------------------------------------------------------------- Kind reader, having had the patience to reach this point, make a last effort, and please circle what you believe to be the correct answers: [1] The material quoted above: [A] Is excerpted in an accurate, substantial and non misleading way from "The C programming Language - Reference Manual" (1978) the authoritative definition of Classic C. [B] I have made it up. [2] The summaries I have made of the various passages quoted: [A] Accurately reflect the contents of said Reference Manual, or at least a consistent and historically defensible interpretation of those contents. [B] I have never read/understood the Reference Manual. [3] The final contentions and suggestions are: [A] Supported by fair and reasonable technical arguments, based on the contents of said Reference Manual, as well as other more mundane points. [B] My advisor (if I had one) must be on drugs. -- Piercarlo "Peter" Grandi INET: pcg@cs.aber.ac.uk Sw.Eng. Group, Dept. of Computer Science UUCP: ...!mcvax!ukc!aber-cs!pcg UCW, Penglais, Aberystwyth, WALES SY23 3BZ (UK)