Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!csd4.milw.wisc.edu!uakari!ark1!dtix!mimsy!chris From: chris@mimsy.UUCP (Chris Torek) Newsgroups: comp.lang.c Subject: Re: null pointers Keywords: null Message-ID: <18728@mimsy.UUCP> Date: 25 Jul 89 06:12:49 GMT References: <883@lakesys.UUCP> Distribution: usa Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742 Lines: 124 In article <883@lakesys.UUCP> chad@lakesys.UUCP (D. Chadwick Gibbons) writes: > As defined by K&R2 (A6.6, p. 198) null is > > "An integral constant expression with [the] value 0, or such > an expression cast to type void * [which may be] converted, by a cast, > by assignment, or by comparision, to a pointer of any type." This is the pANS definition (modulo minor possible wording changes since the draft I have). The Classic definition (Classic C vs New C) simply leaves out the `void *' alternative. The untyped nil pointer is an integral constant expression whose value is zero; it acquires a type (and hence an actual internal representation) by being cast, assigned, or compared to a pointer type. Note that `(void *)0' is a typed nil pointer, but is freely convertible to other typed nil pointers (possibly changing in internal representation in the process). > From what I have seen however, NULL may not be a number with a zero >bit pattern, since some implementations do not store zero as a zero bit >pattern (depending if the value in question is signed or unsigned.) This is not quite right. A typed nil pointer (the only `real' nil pointers are all typed; the untyped nil pointer is a `fake') is not necessarily represented as an all-zero-bits value. When an untyped nil acquires its type, it also acquires its representation, and that representation may be virtually arbitrary, including being different for different types of nil pointer. >Thus, zero and NULL may not equal each other in all implementations, More precisely, `The actual representation for int-zero may differ from that of a nil pointer of any particular type. The four-letter sequence N, U, L, L, is a preprocessor macro which must be defined as one of the source-code representations for an untyped or freely-convertible nil pointer' (the latter is an escape clause to allow `(void *)0'), `and might not happen to be the unadorned number ``0''.' >yet _both_ may be used to safely in comparision of a null pointer. This is because the unadorned number ``0'' is an integral constant expression whose value is zero, which is one of the two pANS-legal source code representations for a general nil pointer, while the four letter sequence N, U, L, L, is required to be defined as one of the two pANS-legal source code representations for a general nil pointer. >It appears that in pre-ANSI C, this is not true, No: this has always been true. Before the pANS (and before the dpANSes that preceded the pANSes) there was only one source code representation for a general nil pointer, namely an integral constant expression whose value was zero. Hence, there was only one legal definition (with many possible spellings) for `#define NULL'. Now there are two (again with many possible spellings---how many ways can *you* write an integral constant expression with value zero?). >and often the symbol constant NULL requires casting to the proper type >to ensure proper conversion of alignment restrictions. This is wrong. Leave off the `of alignment restrictions' and it becomes correct, even in New C. The symbol NULL expands either to the untyped source code nil pointer (`0') or to the freely-convertible source code nil pointer (`(void *)0'), which must be given a type (and hence a true representation) before being used. It acquires that type by being cast, assigned, or compared to a variable or expression that has a pointer type. The freely-convertible nil pointer (void *)0 already has a type (and hence a representation), but when it is used where its type is both incorrect and not-automatically-converted-to-correct, it may have the wrong representation as well. There is only one such place, and that as an argument to a function, where that function does not have a prototype, or that argument is part of a `...' prototype. This is, by design, the same place where the untyped source code nil pointer (`0') is not automatically converted to a nil pointer to the correct type. This is true in both New C and Classic C. What all this means is that there are exactly two correct definitions for NULL in New C, exactly one in Classic C, and exactly one place where casts are required. (There are no places where casts hurt.) The casts are required because each different kind of nil pointer can have a different run-time representation (even though all can use the same source code representation) and the compiler needs to know which run-time representation to use. Without the cast, the compiler has to assume that you really meant `0' (if NULL is #defined to 0) or `(void *)0' (if NULL is #defined this way), because functions without prototypes, or functions with `...' prototypes, can certainly have such arguments. > And then there is the problem of some architectures storing data at >address zero. Occording to the defintions above - this does not matter. Correct---this is a separate problem, and comes down to the one of choosing the run-time representations for each possible kind of nil pointer. >It is the compilers job to assume that the constant zero is for >checking for a null pointer, and if valid data can be at address zero >that comparing a pointer against zero is _not_ a comparision with that >address. Or so I have extrapolated. This is correct, but not terribly well phrased. It is the compiler's job to know that the integer constant zero can be a source code expression meaning `nil pointer to T' for some type T, and to so convert it where there is sufficient information---cast, assignment, or comparison to expression or variable of type pointer to T---and it is the entire runtime system's job to make sure that whatever scheme is chosen works. One scheme, which could be used on a machine where addresses from 0x0000 through 0x3FFF are the only valid addresses, would be to choose the value 0xBAAD for all nil pointer types, and convert if (pointer_var == 0) into compare var,0xBADD instructions, and so forth. Another more common and lazy scheme, which is what was used for the Unix PDP-11 split I&D systems, is to put a `shim' in at location zero, so that even though addresses 0x0000 through 0xFFFF were all legal data locations, there was nothing useful at 0x0000, it being already occupied by the shim. Then the system can use 0 as the runtime representation for all nil pointer types, which makes the compiler a little bit easier to write. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris