Path: utzoo!mnetor!uunet!husc6!purdue!umd5!mimsy!chris From: chris@mimsy.UUCP (Chris Torek) Newsgroups: comp.lang.c Subject: Of Standards and Inventions: A Cautionary Tale Message-ID: <10949@mimsy.UUCP> Date: 6 Apr 88 14:15:44 GMT Reply-To: chris@mimsy.umd.edu (Chris Torek) Organization: University of Maryland, Dept. of Computer Sci. Lines: 87 [Typography convention: /word/ represents /italics/; |word| represents typewriter-text.] By now most of you know my sentiments towards `noalias'. Here, however, is a sequence showing how even the most innocent-seeming inventions can interact to produce surprising results. First, a note about unsignedness: In the C language, the unsigned attribute on a type can be viewed as `sticky': operations on unsigned numbers always yeild an unsigned result. (The only exception is the ternary e1?e2:e3, whose result is independent of the type of e1.) The condition can, of course, be cleared by a cast to a signed type. Second, we have a long-standing clause in the draft standard on /integer constants/, one that determines the type of a constant from its value and that value's representation on your machine. In itself this is nothing new: even K&R say that whether |34567| is an |int| or a |long| will depend on the number of bits in your |int|. The dpANS further says that a constant may become an |unsigned long|. In particular, on machines with 32 bit |long|s, values in 2147483648..4294967295 are |unsigned long|. This is certainly reasonable, or at least seems so. Next we have the introduction of explicitly-unsigned constants. |12U| is to be equivalent to |(unsigned)12|; |99LU| or |99UL| is equivalent to |(unsigned long)99|. This is quite a notational convenience, just as is the existing L suffix, and adding it to compilers is simple: It took perhaps a dozen lines to add it to the 4.3BSD Vax and Tahoe compilers. Again, reasonable, if something of a frill. But now that we have this U suffix, and various files that use it, I find that the preprocessor must do something with it. And indeed, the draft tells us that the preprocessor now has the notion of unsigned arithmetic. Rather than do everything in |long|s, ignoring any U suffixes, it must obey the compiler's rules for combining |long| and |unsigned long|. Is this such a burden? Perhaps; perhaps not: a close approximation in the Reiser preprocessor---making unsigned `sticky'---took only a few changes (the approximation fails only for e1?e2:e3 as noted above). But having unsigned arithmetic available in the preprocessor is clearly semantically desirable: it should be nice to be able to tell whether the maximum unsigned short is greater than 65535U: #include /* * Define a type to hold values in 0..65536. We will * have a large array of these numbers, so use as little * space as possible. */ #if USHRT_MAX > 65535U typedef unsigned short bigunum; #else typedef unsigned long bigunum; /* dpANS says u_long must suffice */ #endif Each of these inventions (for inventions they are, at least as they have been phrased) seems perfectly reasonable. At least, each one seems so to me. But lo! what has happened when we combine them all? The answer to that lies in the following question: On a machine with 32 bit |long|s and two's complement arithmetic, what is the type of -2147483648 in the preprocessor? Since the preprocessor is required to follow the same rules as the compiler, and is possesed of the notion of unsigned, we find that it is first to compute 2147483648 and then to negate it, and when it does the former it finds that the type is |unsigned long|. The negation changes nothing: /neither the type nor the value/. As noted earlier, the only way to remove the unsigned attribute is to use a cast. But since the preprocessor explicitly disallows casts, there is no way to get -2147483648! In particular, this means that #include #if LONG_MIN > 0 is guaranteed to be /true/ on any two's complement machine! The moral, if you will, of this story is that even obvious and well-behaved inventions may not always work together. If something as simple as putting unsigned arithmetic in the preprocessor has such a surprising result, what can we expect of inventions like |noalias|? Perhaps this will show why I am uneasy about /every/ invention in this draft standard, even such obvious improvements as prototypes. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris