Newsgroups: comp.lang.c Path: utzoo!utgpu!news-server.csri.toronto.edu!torsqnt!lsuc!sq!msb From: msb@sq.sq.com (Mark Brader) Subject: Re: is this array access portable? Message-ID: <1991Jun27.092604.5474@sq.sq.com> Organization: SoftQuad Inc., Toronto, Canada References: <1991Jun23.185351.5695@thunder.mcrcim.mcgill.edu> Date: Thu, 27 Jun 91 09:26:04 GMT Lines: 77 > sometype foo[40][50]; > sometype *fp; > int i; > > fp = &foo[0][0]; > for (i=2000;i>0;i--) *fp++ = something; > > Is this portable? (The significant question is whether the wraparound > from the end of one row to the beginning of the next is guaranteed to > work correctly.) This is interesting. The answer is that the *above* is valid ANSI C, but the seemingly equivalent last line for (i=0;i<2000;++i) fp[i] = something; i=0; *isn't* valid. The reason is this. The standard does guarantee that array elements are stored in contiguous fashion as you expect, and that two-dimensional arrays work in row major order (because they are really arrays of arrays). Consequently, &foo[0][50] and &foo[1][0] are equal. And since fp is equal to the pointer to which foo[0] decays, &fp[50] equals &foo[1][0]. You might expect that &fp[51] would therefore be equal to &foo[1][1]. It *isn't* -- any more than 3*xx/xx is equal to 3 when xx is zero. Computing &fp[51] involves an out-of-bounds array reference and is undefined behavior in ANSI C, whereas &foo[1][1] is valid. Computing &fp[50] is also an out-of-bounds array reference, but you're allowed to go *one* position past the upper bound in ANSI C if you don't dereference the resulting pointer. Assigning to fp[50] is an error. The second version of the loop could fail on the iteration when i is 50. If this is so, why is the first version valid? Well, the difficulty in the second version is not the transition from the first to the second row of the large array; it's in the addition of 51 to fp. If you increase the pointer value by steps of 1 at a time, in due course you reach the magic value which is one past the end of the first row -- AND is guaranteed to also be the beginning of the second row. That is, in the first version of the loop, at some point you have the value of &foo[0][49] in fp. You indirect through that, which assigns to foo[0][49]. Then you increment fp. This increment takes it from &foo[0][49] to &foo[0][50], which is valid since it doesn't go more than one place past the end of the array foo[0]. Now ordinarily you couldn't indirect through this pointer value. But in this case you *can*, because it is known to be equal to &foo[1][0], which you can indirect through all right. And then when you increment this, of course, you get &foo[1][1], and so on to the end. I checked by email with Doug Gwyn before posting this, and he confirmed that there had been an interpretation ruling on this or a very similar case. Now I have answered the question theoretically in terms of ANSI C. In terms of K&R C, both loops are illegal, as the "one past the end" rule didn't exist in K&R. And in practical terms, very few implementations would reject either one, simply because very few do array bounds checking. But bounds checking *is* the only issue here; the alignment of the array elements is guaranteed. If you wanted to write a loop like the second one without problems from bounds checking, you could either use an array in a union: union { sometype f1 [40][50]; sometype f2 [40*50]; } foo; or you could malloc() the array. -- Mark Brader, Toronto "If you feel [that Doug Gwyn] has a bad attitude, utzoo!sq!msb, msb@sq.com then use lint (or Chris Torek...)" -- Joe English This article is in the public domain.