Path: utzoo!censor!geac!torsqnt!news-server.csri.toronto.edu!clyde.concordia.ca!thunder.mcrcim.mcgill.edu!snorkelwacker.mit.edu!apple!usc!wuarchive!udel!haven!mimsy!chris From: chris@mimsy.umd.edu (Chris Torek) Newsgroups: comp.lang.c Subject: Re: sizeof and multi-dimensional arrays Message-ID: <28950@mimsy.umd.edu> Date: 6 Jan 91 19:27:48 GMT References: <1991Jan5.050613.22303@Neon.Stanford.EDU> <4596@sactoh0.SAC.CA.US> <10303@hydra.Helsinki.FI> <1991Jan5.232225.14909@ccs.carleton.ca> Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742 Lines: 287 First, the instant replays (I guess I saw too much football yesterday :-) ); then a tutorial essay.... In article <1991Jan5.050613.22303@Neon.Stanford.EDU> dkeisen@Gang-of-Four.Stanford.EDU (Dave Eisen) asks why, with his compiler, >char x[2][3]; > sizeof (*x) gives 6 > sizeof (x[0]) gives 3. >What's the scoop? (The correct answer is `There is a bug in that compiler.') In article fred@prisma.cv.ruu.nl (Fred Appelman) writes: >You are just confused. >'x' is a two dimensional array of 2*3 elments of type char. Makes a total of >6. 'x[0]' and 'x[1]' are arrays with a length of 3 elements. So both arrays >have a size of 3. This is correct, but does not explain why the compiler produces 6 for `sizeof (*x)'. (Of course, no one without the source can explain the particular bug in that compiler.) In article <4596@sactoh0.SAC.CA.US> jak@sactoh0.SAC.CA.US (Jay A. Konigsberg) adds: >Something is wrong here. (True enough.) >sizeof(x) makes sense as it is returning the total size declared for > the array. >sizeof(x[0]) makes sense as it returns the total size of that dimmension > of the array. Right. >sizeof(*x) DOES NOT make sense. The size of a pointer on this machine is 4 bytes. (Note: adding "char *y; sizeof(y) does return 4). Not right. In article <10303@hydra.Helsinki.FI> wirzeniu@cs.Helsinki.FI (Lars Wirzenius) corrects Jay Konigsberg: >But *x isn't a pointer, it's an array. First the the type of x decays >from "array 5 of array 6 of char" into "pointer to array 6 of chars". >(See for example: _Standard_C_, by P.J.Plauger and Jim Brodie, page 74, >or K&R-2, Section A7.1, "Pointer Generation", page 200.) > >This pointer is dereferenced with '*', and the result is an array of >type |char [6]|, which has the size 6. This is exactly right. Finally, in article <1991Jan5.232225.14909@ccs.carleton.ca> a mystery person (`Engineers' seems rather an unlikely surname!) given as bull@ccs.carleton.ca (Bull Engineers) writes: >Sorry, sizeof(*x) makes perfect sense. Remember, the * operator >means "evaluate what's at this address". This means, that for >two-dimensional arrays, *x and x[0] are identical by definition. Try >this with a three dimensional array z[2][3][4]. sizeof(z) = 24, >sizeof(z[0]) = 12, and sizeof(*z) = 12 also. Why? Because *z >dereferences the first (0th) dimension of z. This is awfully informal, but is the right idea. [begin tutorial] Key concepts: types objects values contexts (object and value) address-of operator `&' changes object to value indirect operator `*' changes value to object arrays in object contexts remain arrays arrays in value contexts become values C has five different `places' in which array identifiers (including [] and `*') can appear: - declarations and definitions: int i, a[10], *p; /* local, global, extern, whatever */ These can be further divided into formal parameters and all others. - `left hand sides' (`to the left of an assignment'): i = 3; a[2] = 4; This includes the `modifying' operators `++' and `--', i.e., in the expression a[3] = ++i; the `i' being incremented is in a `miniature left hand side' of its own. - `right hand sides': p = a; Here `p' is in a `left side', or `left value', or `lvalue', context, and `a' is in a `right side', or `right value', or `rvalue', context. - sizeof: sizeof(a) An identifier that follows sizeof is treated as if it were in a `left value' context. (More on this in a bit.) - address-of operator: &i An identifier that follows an address-of ampersand (`&') is also treated as if it were an `lvalue'. Aside from declarations and definitions, then, there are really only two contexts here, `lvalue' and `rvalue'. Since an `lvalue' identifier need not actually appear on the left---as is the case with `++i' above---I prefer to call these `object' and `value' contexts. Other books may use `lvalue' and `rvalue' respectively. In an object context, we are interested in the object itself. Usually the variable name corresponds to some `address' (whatever that is; the C language does not pin down addresses all that exactly, so that whatever the system uses for addresses will probably suffice). `i', `a', and `p' above each have some address%. Each variable has a type, and so each of these addresses also has a type corresponding to the variable's type: name is a/an so its address is a ---- ------------------- i int pointer to int a array 10 of int pointer to array 10 of int%% p pointer to int pointer to pointer to int This address is what the `&' operator produces. The result of the `&' operator is itself a value, not an object; a value does not have an address and it is therefore illegal to try to take it, so `&(&i)' is illegal. (Most C compilers correctly diagnose this error, although many do not correctly diagnose `&(&*p)'. This does not make &(&*p) legal: even though it *could* be defined as &p, it happens that it is not. If you want &p, write &p.) ----- % Note that `i', `a', and `p' need not be given addresses unless the code takes those addresses with `&'. A smart compiler can, if the machine allows it, put objects into machine registers or other `special' places. In a few cases, it can do this even when the object's address is taken. (One example occurs on Pyramid computers, where the registers have addresses.) The `register' keyword acts as a promise, and sometimes as a recommendation: `I promise not to take the address of this variable, and suggest that the compiler might put it in a machine register.' Most modern compilers completely ignore the advice, and some do not even hold you to the promise. %% In `old C' as defined by K&R 1st edition, &a is illegal. This is no longer the case; &a is the address of the array `a', and its type is `pointer to array 10 of int'. ----- `sizeof' is not really interested in the object's address, but on the other hand, it is not interested in the object's value either. Objects that appear in `sizeof' contexts are used only for their type. The size of that type, whatever it is, is `spliced in' as though it were an integral constant. (Note that this constant has type `size_t'.) In other words, given `char c;', writing `sizeof c' is essentially the same as writing `(size_t)1'. This leaves assignments and value contexts (and declarations and definitions, which I am ignoring). Here things start to get a bit peculiar. For sizeof and address-of, we are only interested in the size and type of the object that follows, but in assignments and values, we need the value of the object as well---sometimes to fetch it, sometimes to set it, sometimes both. This is all well and good for `simple' objects like `i', for pointers like `p', and (these days) even for structure and union objects (with some restrictions). But array objects are different. They get no respect. An assignment to an array object is simply illegal. (Note that the initial value that may appear in a definition is not an assignment%: it is an initializer. That is why it is legal there.) `i = 3;' is fine, but `a = { 0,1,2,3,4,5,6,7,8,9 };' is not. You might think, then, that taking the value of an array would also be illegal. ----- % Well, technically speaking, at least. It looks and acts like an assignment, but the rules regarding what is and is not legal are different. ----- Here is where things get very strange. Instead of being outlawed, an attempt to take the `value' of an array is treated as an attempt to take the address of the first element of the array (the one with subscript 0). So in p = a; the compiler pretends you wrote instead p = &a[0]; a[0] is an object of type `int', therefore its address is a value of type `pointer to int', so we have an assignment with a `pointer to int' on the left (p) and a `pointer to int' on the right (&a[0]) and everything is okay. There is a subtlety here as well. How did we name a[0] in the first place? The expression a[0] breaks down into four sub-expressions: a 0 add indirect As above, the `a' turns into the address of a[0]. To this value we add 0 (leaving it unchanged) and then indirect. This changes the value `pointer to a[0]' into the object `a[0]'. In other words, we have to know where a[0] is in order to find a[0]! So it is a good thing we can find a[0] by asking for `a'. Formally, then, the rule is: In a value context, an object of type `array N of T' (where N is an integral constant and T is a legal type) becomes a value of type `pointer to T' whose value is the address of the first element--- element number 0---of that array. Remember also that the `&' address-of operator takes an object and produces a value, and that the `*' indirect operator takes a value and produces an object. For `&' the value produced has type `pointer to ...' while for `*' the value consumed must have type `pointer to ...'. In each case the `...' represents the type of the object (whether consumed or produced). Rewinding to the original question, then: >char x[2][3]; > sizeof (*x) gives 6 > sizeof (x[0]) gives 3. >What's the scoop? We can see that this is a compiler bug by expanding the two arguments to `sizeof'. These are each in object context and we want their types. First we have *x This means that x appears in a value context (`*' takes a value and produces an object). It had better come out as a value of type `pointer to ...'. Well: `x' is an `array 2 of array 3 of char', but as noted above, an array in a value context gets changed: In a value context, an object of type `array N of T' (where N is an integral constant and T is a legal type) becomes a value of type `pointer to T' whose value is the address of the first element--- element number 0---of that array. so we have an array with N=2 and T=`array 3 of char'. This becomes a value of type `pointer to T', or in this case, `pointer to array 3 of char', pointing to the first element of x (x[0]). So we can apply the indirecting `*'. The indirection changes this `pointer to array 3 of char' into the object `array 3 of char'. Thus we want the size of an object that is an `array 3 of char'; by definition, this is the value `3'. To check `sizeof x[0]', do the same thing. Write down the expression: sizeof x[0] Break down the subexpression x[0] by rewriting according to its definition: *( (x) + (0) ) Handle the subexpression x+0, noting the contexts: *( [value] ( [value] (x) + [value] (0) ) ) `x' is an array in a value context, so apply The Rule from above: [value] (x) = [value] = [value] Adding 0 leaves the pointer unchanged, so apply the `*': *( ) = Now we have an object in an object context (target of `sizeof') so we just read its type---`array 3 of char'---and decide its size: 3. Incidentally, sizeof can handle values as well as objects: `sizeof 3+4' produces the same constant as `sizeof(int)'. Sizeof is unique in this; other C operators that take objects refuse to work on values. Of course, sizeof can also take a type in parentheses, which shows just how special it is. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris