Newsgroups: comp.lang.c
Path: utzoo!utgpu!news-server.csri.toronto.edu!torsqnt!lsuc!sq!msb
From: msb@sq.sq.com (Mark Brader)
Subject: Re: is this array access portable?
Message-ID: <1991Jun27.092604.5474@sq.sq.com>
Organization: SoftQuad Inc., Toronto, Canada
References: <1991Jun23.185351.5695@thunder.mcrcim.mcgill.edu>
Date: Thu, 27 Jun 91 09:26:04 GMT
Lines: 77

> 	sometype foo[40][50];
> 	sometype *fp;
> 	int i;
> 
> 	fp = &foo[0][0];
> 	for (i=2000;i>0;i--) *fp++ = something;
> 
> Is this portable?  (The significant question is whether the wraparound
> from the end of one row to the beginning of the next is guaranteed to
> work correctly.)

This is interesting.  The answer is that the *above* is valid ANSI C, but
the seemingly equivalent last line

	for (i=0;i<2000;++i) fp[i] = something;   i=0;

*isn't* valid.

The reason is this.  The standard does guarantee that array elements
are stored in contiguous fashion as you expect, and that two-dimensional
arrays work in row major order (because they are really arrays of arrays).
Consequently, &foo[0][50] and &foo[1][0] are equal.  And since fp is
equal to the pointer to which foo[0] decays, &fp[50] equals &foo[1][0].

You might expect that &fp[51] would therefore be equal to &foo[1][1].
It *isn't* -- any more than 3*xx/xx is equal to 3 when xx is zero.
Computing &fp[51] involves an out-of-bounds array reference and is
undefined behavior in ANSI C, whereas &foo[1][1] is valid.

Computing &fp[50] is also an out-of-bounds array reference, but you're
allowed to go *one* position past the upper bound in ANSI C if you don't
dereference the resulting pointer.  Assigning to fp[50] is an error.
The second version of the loop could fail on the iteration when i is 50.

If this is so, why is the first version valid?

Well, the difficulty in the second version is not the transition from
the first to the second row of the large array; it's in the addition
of 51 to fp.  If you increase the pointer value by steps of 1 at a time,
in due course you reach the magic value which is one past the end of the
first row -- AND is guaranteed to also be the beginning of the second row.

That is, in the first version of the loop, at some point you have the
value of &foo[0][49] in fp.  You indirect through that, which assigns to
foo[0][49].  Then you increment fp.  This increment takes it from &foo[0][49]
to &foo[0][50], which is valid since it doesn't go more than one place past
the end of the array foo[0].  Now ordinarily you couldn't indirect through
this pointer value.  But in this case you *can*, because it is known to
be equal to &foo[1][0], which you can indirect through all right.  And
then when you increment this, of course, you get &foo[1][1], and so on
to the end.

I checked by email with Doug Gwyn before posting this, and he confirmed
that there had been an interpretation ruling on this or a very similar
case.

Now I have answered the question theoretically in terms of ANSI C.
In terms of K&R C, both loops are illegal, as the "one past the end" rule
didn't exist in K&R.  And in practical terms, very few implementations
would reject either one, simply because very few do array bounds checking.

But bounds checking *is* the only issue here; the alignment of the array
elements is guaranteed.  If you wanted to write a loop like the second
one without problems from bounds checking, you could either use an array
in a union:

	union {
 		sometype f1 [40][50];
 		sometype f2 [40*50];
	} foo;

or you could malloc() the array.
-- 
Mark Brader, Toronto	    "If you feel [that Doug Gwyn] has a bad attitude,
utzoo!sq!msb, msb@sq.com     then use lint (or Chris Torek...)" -- Joe English

This article is in the public domain.