Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!csd4.milw.wisc.edu!uakari!ark1!dtix!mimsy!chris
From: chris@mimsy.UUCP (Chris Torek)
Newsgroups: comp.lang.c
Subject: Re: null pointers
Keywords: null
Message-ID: <18728@mimsy.UUCP>
Date: 25 Jul 89 06:12:49 GMT
References: <883@lakesys.UUCP>
Distribution: usa
Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742
Lines: 124

In article <883@lakesys.UUCP> chad@lakesys.UUCP (D. Chadwick Gibbons) writes:
>	As defined by K&R2 (A6.6, p. 198) null is
>
>		"An integral constant expression with [the] value 0, or such
>	an expression cast to type void * [which may be] converted, by a cast,
>	by assignment, or by comparision, to a pointer of any type."

This is the pANS definition (modulo minor possible wording changes since
the draft I have).  The Classic definition (Classic C vs New C) simply
leaves out the `void *' alternative.

The untyped nil pointer is an integral constant expression whose value
is zero; it acquires a type (and hence an actual internal
representation) by being cast, assigned, or compared to a pointer
type.  Note that `(void *)0' is a typed nil pointer, but is freely
convertible to other typed nil pointers (possibly changing in internal
representation in the process).

>	From what I have seen however, NULL may not be a number with a zero
>bit pattern, since some implementations do not store zero as a zero bit
>pattern (depending if the value in question is signed or unsigned.)

This is not quite right.  A typed nil pointer (the only `real' nil
pointers are all typed; the untyped nil pointer is a `fake') is not
necessarily represented as an all-zero-bits value.  When an untyped nil
acquires its type, it also acquires its representation, and that
representation may be virtually arbitrary, including being different
for different types of nil pointer.

>Thus, zero and NULL may not equal each other in all implementations,

More precisely, `The actual representation for int-zero may differ from
that of a nil pointer of any particular type.  The four-letter sequence
N, U, L, L, is a preprocessor macro which must be defined as one of
the source-code representations for an untyped or freely-convertible
nil pointer' (the latter is an escape clause to allow `(void *)0'),
`and might not happen to be the unadorned number ``0''.'

>yet _both_ may be used to safely in comparision of a null pointer.

This is because the unadorned number ``0'' is an integral constant
expression whose value is zero, which is one of the two pANS-legal
source code representations for a general nil pointer, while the four
letter sequence N, U, L, L, is required to be defined as one of the two
pANS-legal source code representations for a general nil pointer.

>It appears that in pre-ANSI C, this is not true,

No: this has always been true.  Before the pANS (and before the dpANSes
that preceded the pANSes) there was only one source code representation
for a general nil pointer, namely an integral constant expression whose
value was zero.  Hence, there was only one legal definition (with many
possible spellings) for `#define NULL'.  Now there are two (again with
many possible spellings---how many ways can *you* write an integral
constant expression with value zero?).

>and often the symbol constant NULL requires casting to the proper type
>to ensure proper conversion of alignment restrictions.

This is wrong.  Leave off the `of alignment restrictions' and it
becomes correct, even in New C.  The symbol NULL expands either to the
untyped source code nil pointer (`0') or to the freely-convertible
source code nil pointer (`(void *)0'), which must be given a type (and
hence a true representation) before being used.  It acquires that type
by being cast, assigned, or compared to a variable or expression that
has a pointer type.

The freely-convertible nil pointer (void *)0 already has a type (and
hence a representation), but when it is used where its type is both
incorrect and not-automatically-converted-to-correct, it may have the
wrong representation as well.  There is only one such place, and that
as an argument to a function, where that function does not have a
prototype, or that argument is part of a `...' prototype.  This is, by
design, the same place where the untyped source code nil pointer (`0')
is not automatically converted to a nil pointer to the correct type.
This is true in both New C and Classic C.

What all this means is that there are exactly two correct definitions
for NULL in New C, exactly one in Classic C, and exactly one place
where casts are required.  (There are no places where casts hurt.)  The
casts are required because each different kind of nil pointer can have
a different run-time representation (even though all can use the same
source code representation) and the compiler needs to know which
run-time representation to use.  Without the cast, the compiler has
to assume that you really meant `0' (if NULL is #defined to 0) or
`(void *)0' (if NULL is #defined this way), because functions without
prototypes, or functions with `...' prototypes, can certainly have
such arguments.

>	And then there is the problem of some architectures storing data at
>address zero.  Occording to the defintions above - this does not matter.

Correct---this is a separate problem, and comes down to the one of
choosing the run-time representations for each possible kind of nil
pointer.

>It is the compilers job to assume that the constant zero is for
>checking for a null pointer, and if valid data can be at address zero
>that comparing a pointer against zero is _not_ a comparision with that
>address.  Or so I have extrapolated.

This is correct, but not terribly well phrased.  It is the compiler's
job to know that the integer constant zero can be a source code expression
meaning `nil pointer to T' for some type T, and to so convert it where
there is sufficient information---cast, assignment, or comparison to
expression or variable of type pointer to T---and it is the entire
runtime system's job to make sure that whatever scheme is chosen works.

One scheme, which could be used on a machine where addresses from
0x0000 through 0x3FFF are the only valid addresses, would be to choose
the value 0xBAAD for all nil pointer types, and convert
	if (pointer_var == 0)
into
	compare var,0xBADD
instructions, and so forth.  Another more common and lazy scheme, which
is what was used for the Unix PDP-11 split I&D systems, is to put a
`shim' in at location zero, so that even though addresses 0x0000
through 0xFFFF were all legal data locations, there was nothing useful
at 0x0000, it being already occupied by the shim.  Then the system can
use 0 as the runtime representation for all nil pointer types, which
makes the compiler a little bit easier to write.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris