Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!utcsri!greg
From: greg@utcsri.UUCP
Newsgroups: comp.lang.c
Subject: Re: char (*a)[]  (was: Style [++i vs i++])
Message-ID: <5391@utcsri.UUCP>
Date: Wed, 31-Dec-69 18:59:59 EDT
Article-I.D.: utcsri.5391
Posted: Wed Dec 31 18:59:59 1969
Date-Received: Sun, 13-Sep-87 17:35:18 EDT
References: <8298@brl-adm.ARPA> <587@cblpe.ATT.COM> <189@xyzzy.UUCP> <2310@mmintl.UUCP> <871@mcgill-vision.UUCP> <2348@mmintl.UUCP> <253@xyzzy.UUCP>
Reply-To: greg@utcsri.UUCP (Gregory Smith)
Organization: CSRI, University of Toronto
Lines: 95
Summary: 

In article <253@xyzzy.UUCP> throopw@xyzzy.UUCP (Wayne A. Throop) writes:
>> franka@mmintl.UUCP (Frank Adams)
>> So does arithmetic on a null pointer produce undefined results?  I don't
>> have a copy of the proposed standard available, so I don't know what it
>> says.  This is about what it *should* say; if it doesn't, it should be
>> changed.
...
>on this point is that the standard does *NOT* say that arithmetic on the
>null pointer produces undefined results (contrary to my own
...
>> [it should be undefined because...]
>> So we have the general principle that pointer arithmetic should not be able
>> to adjust the value of the pointer outside the guaranteed "neighborhood" of
>> legal values near that pointer.

>> In the case of a null pointer, the only legal value in that neighborhood
>> is null itself; thus "(char *)0 + 1" produces an undefined result.
>> ("(char *)0 + 0" would be legal, and equivalent to "(char *)0".)

I agree that it should be illegal to do any kind of arithmetic on
a null pointer of any type.

All this stuff gets very interesting when you consider what happens on
an 80286 in its native 'protected' mode (as opposed to the 'fast 8086' mode
in which most of them are warming their sockets).

In this mode, a pointer is 32 bits; a 16-bit segment number, and a 16-bit
offset. It is meaningless to do arithmetic on the segment number since
it is just an index into a table maintained by the OS. Pointer
arithmetic as we know it in C affects only the offset.

The CPU supports a 'null' pointer as follows: Any pointer whose segment
part is zero is considered a null pointer. It is not legal to dereference
such a pointer, and it is not legal to load one into the stack-pointer
register pair (SS:SP) or the program-counter register pair (CS:IP).
Violations cause hardware traps.

It is legal to load a null pointer as a 'data pointer'. (What this really
means is that you can put 0 into DS and ES but not CS or SS). Thus the
code for incrementing a pointer, when given a null pointer, will always
produce a null pointer.

The other weird bit concerns the range of these pointers. The
compiler may assign a separate segment for every data object.
The segment has a size, and any reference to that segment beyond this
size causes a trap.
Suppose I declare 'int foo[10]', then I may get a 20-byte segment for
foo. Then &foo[10] is a pointer which is illegal to dereference.
This is good.
There are lots of bits of code like this:
	for( p = foo; p < &foo[10]; ++p ){
which cause p to be repeatedly compared to a constant invalid pointer until it
becomes an invalid pointer itself.  I can live with that.

What gets a little weird is this: pointer inequalities are done by comparing
only the offset part, since the comparison is invalid anyway if the segment
numbers are different. Also, offset arithmetic is done in 16 bits.  This
means that foo[-1] is not only an invalid pointer, but it will be 'greater
than' foo[0] since it will have an offset of 0xfffe. What this means is that
the following won't work:
	for( p = &foo[9]; p >= foo; --p ){	/* loops forever */
Furthermore, if I declare a 64K segment ( int foo64[32768] ), the (overflowed)
value of &foo64[32767] + 1 is the same as &foo64[0]. Thus not even this
will work:
	for( p = foo64; p <= &foo64[32767]; ++p ){	/* loops forever */

In order to avoid these problems, then, we need a class of pointers
which cannot be dereferenced but which can be used in comparisons.
It is sufficient that these pointers be restricted to the form (&x)+1,
where x is any valid data object. (&x)+1 > (&x) must always hold for
any data object x (which rules out a full 64k byte segment on a 286).
It would be nice if &x-1 were always less than &x, but that is not
possible under this segmentation scheme.

The ANSI standard must have something about such pointers. Do they
say roughly the same thing about them as I have in the preceding paragraph?

Sorry for all the blather, but I have noticed several previous
postings that have overlooked these considerations. These people
may never have to program on such an architecture, but it seems
like it isn't too much trouble to avoid constructs which won't port.
What I am looking for is a somewhat more concrete definition of
which constructs will and won't work.

[ e.g. what about: p = &foo[-1]; do{ ++p; ... }while( p <= &foo[9] );
 Does the first ++p cause p to be &foo[0]?
 Can I legally add 4123 to &foo[0], and if I then subtract 4120 do
 I get &foo[3]?
]
P.S. I am not a segment fan, but a pragmatist recently transplanted
to the real world ( arrggg! ).
-- 
----------------------------------------------------------------------
Greg Smith     University of Toronto      UUCP: ..utzoo!utcsri!greg
Have vAX, will hack...