Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84 SMI; site sun.uucp Path: utzoo!linus!decvax!decwrl!sun!guy From: guy@sun.uucp (Guy Harris) Newsgroups: net.lang.c Subject: Re: Uses of "short" ? Message-ID: <2796@sun.uucp> Date: Fri, 13-Sep-85 21:49:58 EDT Article-I.D.: sun.2796 Posted: Fri Sep 13 21:49:58 1985 Date-Received: Sun, 15-Sep-85 05:12:42 EDT References: <486@houxh.UUCP> <2600013@ccvaxa> Organization: Sun Microsystems, Inc. Lines: 128 > But 'int' is a perfectly good abstraction; more abstract that 'short' > or 'long.' The restriction of certain values to certain ranges CAN > be part of an abstraction, but it can also be an incidental factor > that is only useful because some machines make a distinction that > makes it useful. That says to me that use of 'short' or 'long' > instead of 'int' shows more attention to machine specificity. The use of "long" instead of "int" shows more attention to machine specificity? OK, we have the object "Internet address". This object can be represented as, among other things, a 32-bit quantity. (We neglect the problem of non-binary machines for the nonce.) Implementing this object with an "int" shows a hell of a lot of attention to machine specificity, since it won't work worth a damn on a PDP-11, or any other machine with "int"s less than 32 bits (like machines based on current 8086-family chips, or 68000/68010/68008 machines with 16-bit-"int" compilers). Implementing it with a "long" shows a lot less machine specificity, since (according to the ANSI C standard) a "long" can hold numbers in the range -2147483647 to 2147483647. (On a two's complement machine, or a one's complement machine, or even a sign-magnitude machine, this requires 32 bits.) The same argument applies to "unsigned int" vs. "unsigned long". If the 4.xBSD networking code had been written with less implicit knowledge of the machines it would work on - i.e., if the type "long" had been used where the C specification says it should be (the information explicitly described in the ANSI C standard is here considered part of the "implicit" specification of C - yes, it's folklore, but UNIX is still dominated by folklore) - it would have moved to 2.9BSD more easily. I believe a certain popular news-reading system had much the same problem; it stored the length of an article in an "int" instead of a "long". Earlier versions of Berkeley Mail had the same problem (and "mailx" is based on one of those earlier versions, alas). "int" is to be used to implement "integer" objects whose value will *never* be outside the range -32767 to 32767, and where the amount of space taken up by the object is less important than the amount of time required to manipulate it. (Well, modulo machines with 16-bit data paths and large address spaces, where "int"s are often 32 bits even though it takes more time to manipulate them than it does to manipulate 16-bit quantities.) "short" is to be used to manipulate "integer" objects whose value will never be outside that range but where the amount of space taken up by the object is more important than the amount of time required to manipulate it (either because there's a limit on the address space or physical memory available, or because the object's representation must conform to some externally-imposed restrictions). "long" is to be used to manipulate "integer" objects whose value can be outside the aforementioned range, or whose representation must conform to some externally-imposed restriction that requires the use of "long". Code that uses "int" to implement objects known to have values outside the range -32767 to 32767 is incorrect C. The ANSI standard explicitly indicates this. Even in the absence of such an explicit indication in an official language specification document, this information should be imparted to all people being taught C. If you removed "long" from the C language, you would either 1) have a language incapable of talking about numbers outside the range -32767 to 32767 or 2) have a language which requires at least an 18-bit machine and probably at least a 32-bit machine. "short" is less commonly used, since it provides no guarantees about the range of integral values it can represent that "int" doesn't provide. However, it should be obvious to anyone who is aware of the fact that correct C code can, in most if not all cases, be moved from one machine to another (assuming no operating system dependencies) simply by recompiling it that there *is* a reason to use "short" instead of "int" even if sizeof(short) == sizeof(int) and even if the data doesn't have to conform to some external specification. Thinking of "short" as a compact form of "int" and using it wherever space-efficiency is *or might be* of primary importance will yield code that is more likely to run happily on a variety of machines (and is less likely to piss off the guy who has to get the program running efficiently on a computer other than one of the ones the original programmer had in their shop). > It may be the case that in a certain piece of code it is possible > to prove that a variable's value must lie in a particular range. > If the programmer specifies that range somehow, compilers for > languages that support that distinction can produce code taking > advantage of it. From the programmer's point of view, however, > that provable range is probably not significant to her view of > the process. In a lot of cases, I damn well hope that the provable range is significant to the programmer's view of the process. In the code int a[10]; int i; i = ; a[i]++; if the programmer's view of the process does not include the (provable) condition that "i" will never have a value outside the range 0 to 9, this code is incorrect and, by Murphy's Law, will proceed to demonstrate that fact at the worst possible moment. Plenty of code demonstrates its incorrectness in similar fashion; the code FILE *foo; char buf[SIZE]; foo = fopen(, "r"); fgets(buf, SIZE, foo); will so demonstrate on a Sun (or a CCI Power 5/20 or a lot of other machines) simply by being run after ensuring that the file to be opened and read does not exist. Replacing it with foo = fopen(...); if (foo != NULL) fgets(...); will probably render it provable that the "fgets" call won't screw up - or, at least, won't screw up by reading from an unopened file. I don't know whether there are proof techniques which are powerful enough to prove the correctness of code involving subrange types in all interesting cases and which are practical. If there are, I'd like to see them incorporated into compilers and have the compiler refuse to generate code unless 1) the necessary checks are put in or maybe 2) explicit directives are inserted *into the code* to tell the compiler that you know what you're doing and it should trust you. (I don't want it to be a compiler option; I want the code to explicitly indicate that it's being unsafe.) Guy Harris