Path: utzoo!attcan!uunet!samsung!umich!ox.com!yale!cmcl2!adm!smoke!gwyn From: gwyn@smoke.BRL.MIL (Doug Gwyn) Newsgroups: comp.std.c Subject: Re: X3J11 Pleasanton meeting summary Message-ID: <14049@smoke.BRL.MIL> Date: 8 Oct 90 12:09:09 GMT References: <13996@smoke.BRL.MIL> <1990Oct3.184359.2348@sq.sq.com> <1737:Oct803:02:5890@kramden.acf.nyu.edu> Organization: U.S. Army Ballistic Research Laboratory, APG, MD. Lines: 99 In article <1737:Oct803:02:5890@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >In article <1990Oct3.184359.2348@sq.sq.com> msb@sq.sq.com (Mark Brader) writes: >> > int a[4][5]; >> > a[1][7] = 0; /* undefined behavior */ >> > Dave Prosser (our Redactor) vigorously protested the above interpretation. >> My opinion is that the protest was right and the ruling wrong. >On what basis? If I declare char x[100][3], for example, the compiler >might want to allocate an extra byte for each element of x. Isn't this >allowed by the standard? Okay, I guess there is some point in summarizing the X3J11 discussion about this issue. Let's first of all give names to specific relevant types: typedef int cell; /* could be any object type, not just "int" */ typedef cell row[5]; typedef row matrix[4]; Then the largest object involved is matrix a; /* same as int a[4][5]; */ There are other objects that can readily be identified here; a[1]; /* denotes a row object */ a[1][4]; /* correctly denotes a cell object */ X3J11 seemed to agree that there are sufficient constraints in the standard that one can assert the following: assert(sizeof a == 4*sizeof a[1] && sizeof a == 4*5*sizeof a[1][4]); In other words, the size of an array element INCLUDES any padding necessary for adjacent elements to abut cleanly, and there is no additional padding included in an array object. Thus, every implementation COULD choose to give a[1][7] a well-defined meaning; alignment and padding are not issues. The actual issue is, what really constitutes an array object. Note that in the declaration grammar, for example in 3.5.4.2, an array HAS only one level of aggregation. There is not officially any such thing as a "multi- dimensional array" in C, only arrays of arrays. (The description in such terms in 3.3.2.1 Semantics should be considered informal, English, usage of "multidimensional" for purposes of exposition, not the implicit introduction of a technical language construct. In fact, you have to take that description as referring to the more precise notion of arrays of arrays in order for the description to make any sense.) 3.3.2.1 (p.40) and 3.3.6 (p.48) state quite clearly, in the majority view of X3J11, that subscripting an array in effect removes ONE level of a multi-level aggregation. The wording on p.48 is, for example: "When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts ... In other words, if the expression P points to the i-th element of an array object, the expressions ... point to, respectively, the i+n-th and i-nth elements of the array object, provided they exist. ... If both the pointer operand and the result point to elements of the same array object, ... the evaluation shall not produce an overflow; otherwise, the behavior is undefined. Unless both the pointer operand and the result point to elements of the same array object, ... the behavior is undefined if the result is used as an operand of the unary * operator." (Note that 3.3.2.1 in effect rewrites E1[E2] as (*(E1+(E2))) (they are "identical"), so unary * is applied in our example.) The above should prepare you to understand the committee's response: For an array of arrays, the permitted pointer arithmetic in Standard section 3.3.6 Semantics (p. 48, ll. 12-40) is to be understood by interpreting the use of the word "object" as denoting the specific object determined directly by the pointer's type and value, NOT other objects related to that one by contiguity. For example, the following code has undefined behavior: int a[4][5]; a[1][7] = 0; /* undefined */ Some conforming implementations may chose to diagnose an "array bounds violation", while others may chose to interpret such attempted accesses successfully with the "obvious" extended semantics. Note that even such a subterfuge as the following would not be strictly conforming: void func( int *p ) { p[7] = 0; } void main( void ) { int a[4][5]; func( a[1] ); return 0; } This isn't much of a practical problem, at least not in most code that I recall having written, because most often multidimensional "matrices" are actually allocated as 1-dimensional arrays of the desired "cell" type, accessed later via a pointer to one of the cells (normally the first), and p.48 supports that usage. Indeed, as NCEG considers ways to add more useful notions of arrays and subarrays to the C language, such tight constraints on what are and are not permissible operations on such objects may well prove to be essential, at least from the point of view of implementors of C compilers on "vector" architectures. What is missing in the standard that would be required for such punning to be strictly conforming is some sort of guarantee that an array of arrays of T is also in some contexts considered an array of T itself. As it stands, the rigorous type structure shines through too plainly. Some X3J11 members actually want precisely that, arguing that their implementations warn about array bounds violations and that their customers have indicated that they strongly desire that feature.