Path: utzoo!attcan!utgpu!news-server.csri.toronto.edu!rutgers!mephisto!udel!haven!mimsy!chris From: chris@mimsy.umd.edu (Chris Torek) Newsgroups: comp.lang.c Subject: Re: Array bounds checking: what is legal Message-ID: <26327@mimsy.umd.edu> Date: 1 Sep 90 20:37:35 GMT References: <7611@ucdavis.ucdavis.edu> <26196@mimsy.umd.edu> <29051@nigel.ee.udel.edu> Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742 Lines: 98 In article <26196@mimsy.umd.edu> I wrote: >`&arr[sizeof arr/sizeof *arr]' ... is Officially Legal. (Those who would dispute this are advised to see ANSI Standard X3.159-1989, otherwise known as `The ANSI C Standard', sections 3.2.2.1 (Lvalues and function designators), 3.3.3.4 (The sizeof operator), and 3.3.6 (Additive operators).) This seems to be rather universally misunderstood. To amplify a bit: In article <29051@nigel.ee.udel.edu> gdtltr@freezer.it.udel.edu (Gary Duzan) writes: >I don't believe accessing the element after is legal, but the pointer >is still legal. Correct. Given `int a[4];', the following holds: int *p = a; /* legal */ a[0], a[1], a[2], a[3]; /* all legal */ p[0], p[1], p[2], p[3]; /* all legal */ p = &a[4]; /* legal */ *p; /* illegal (a[4] does not exist) */ p--; /* legal */ p = a; /* legal */ p--; /* illegal */ p = &a[4]; /* legal */ p[-4], p[-3], p[-2], p[-1]; /* all legal */ Note the last carefully: it is not the subscript itself that makes a given x[i] legal or illegal, but rather whether x+i yeilds a legal address and, if so, whether *(x+i) is also legal. Now, as to why &a[4] is legal when a[4] is not, consider: int i; for (i = 0; i < 4; i++) printf("%d\n", i); When this code is run, i takes on five values, namely 0, 1, 2, 3, and 4. Even if we alter the loop slightly to get rid of the `4', i still takes on the value 4: for (i = 0; i <= 3; i++) ... Now what happens if we loop `p' over the various elements in `a'? for (p = &a[0]; p < &a[4]; p++) ... p must eventually take on the value &a[4]. There is no way around it; even if we get rid of the `&a[4]' in the loop, p still winds up with &a[4] as its final value: for (p = &a[0]; p <= &a[3]; p++) ... /* now p == &a[4] */ Since this sort of thing happens all the time in existing code, there was no choice but to make it Officially Legal and require all C compilers to support it. This, on the other hand, is not legal: for (p = &a[3]; p >= &a[0]; p--) /* illegal */ ... This loop supposedly terminates when p takes on the value &a[-1]; but as noted above, &a[-1] is not a legal address, and in fact this code fails on some machines---for instance, on a 68000 where the C compiler starts the data space at location 2, and `a' is a global array of 32-bit `int's that happens to be the first object in the data segment. The code turns into, e.g., loop: ... subql #4,a2 # p-- cmpl #2,a2 # (unsigned long)p < 2? jcs out # if so, exit loop jra loop # otherwise continue and when p==&a[0], p==2, so p-4 puts 0xfffffffe into p, which is still greater than or equal to 2. This is the same old fencepost problem that occurs everywhere. Incidentally, there is a way to keep p from taking on &a[4]: for (p = a;; p++) { ... if (p == &a[3]) break; } This is the same solution required for loops that purport to run to MAXINT or MAXULONG or other such maxima, and it shares their drawback: these are exceedingly ugly. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris