Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!mips!pacbell.com!att!ucbvax!dog.ee.lbl.gov!elf.ee.lbl.gov!torek From: torek@elf.ee.lbl.gov (Chris Torek) Newsgroups: comp.std.c Subject: Re: Pointers to Incomplete Types in Prototypes Message-ID: <12818@dog.ee.lbl.gov> Date: 4 May 91 23:16:35 GMT References: <700@taumet.com> <683g+p#@rpi.edu> <709@taumet.com> Reply-To: torek@elf.ee.lbl.gov (Chris Torek) Organization: Lawrence Berkeley Laboratory, Berkeley Lines: 158 X-Local-Date: Sat, 4 May 91 16:16:35 PDT (This is worth a try, anyway....) The ANSI scoping rules are actually quite simple, but they do produce surprising results sometimes. Here is one way (I believe correct :-) ) to work out what happens. Item: file scope is called `level 0'. File scope ends only at the end of a `source unit' (the source file). Item: braces (the characters `{' and `}') delimit scopes. An open brace introduces a new scope; this scope ends at the corresponding close brace. In effect, `{' increments the current scope and `}' decrements the current scope. Item: function parameters in all forms of declaration and definition appear at scope level 1. Variables (and `goto' labels) that appear inside the function are at level 2 or higher. This is necessary, among other reasons, because f(p) char *p; { int p = 3; ... } is legal (if a bit peculiar). Item: `extern' declarations are inserted at the current scope level (this differs from pcc, in which extern declarations are inserted in scope 0, regardless of the current scope). Goto labels are inserted in scope 2 (so that you can jump across braces). Now, inner scope declarations (higher numbers) of some name may give it a type that differs from an outer (lower number) scope declaration ---for instance, in the f(p) example above, the `int' p is not at all the same as the `char *' p. To disambiguate these, you should keep a mental `stack of paper' by your left, on which there is one sheet per scope, and one very large sheet on your right. (You could share the level 0 page for this but it is easier to imagine a separate sheet). Here is how you work them. (The following ignores name space separation---variables, structure tags, and goto labels all get their own pairs of left-pile,right-page---but is good enough for illustration.) Whenever you come across a *declaration* for a name, you search for that name on the top page on your left (the highest numbered scope). If it appears, you probably have a redeclaration error (e.g., `int k; int k;') (but see below). If not, however, you: 1. Write the name on the right sheet of paper. Append a number. The number can be any that you have not written on the right before, but it is easiest to start at 0 or 1 and increase. Thus, given `int k;' at scope 0 with both pages blank, you would write: k<0> on the right page. 2. Write the name and the same number on the topmost sheet on the left---here, k<0>. This counts as the declaration. It may be `incomplete' if this is a struct or union, in which case we need one more declaration to `complete' it. (This is what `struct foo;', with no structure contents and no variable names, is for. It is a special case.) Whenever you come across a *reference* to a name, you search for that name in *all* the pages on your left, starting at the top (the highest numbered scope). If it appears, you take its number; this is the `real' name for that identifier. If it does *not* appear, this may be an error (e.g., `return foo;' where `foo' is undefined) or it may count as an `incomplete' declaration (e.g., `struct glorp *' where `struct glorp' is undefined). If it is an incomplete declaration, it works just as described above. A declaration that fills out an incomplete type occurs only when it happens on the same piece of paper on the left. Whenever you open a new scope, you add a blank sheet of paper on the left, on top of the pile. Whenever you close it, you throw away the top sheet. The sheet on the right remains `active' for the whole file. (There is a special case for `typedef': when you see `typedef foo bar;' rather than looking for bar, not finding it, and giving it a *new* number, you look for bar, do not find it, and give it the *same* number you found for foo. typedefs for `base' types [int, char, etc] can be written as `bar', if you like. But never mind that.) Two `struct' types are the same ONLY IF THEIR NUMBERS MATCH. Okay, so now what happens with incomplete structure types that appear in various scope levels? Suppose we have void f(p) struct a *p; { struct a { int a; }; ... Working only the `struct' declarations, we start with two blank sheets. Between `f(p)' and the first `{' we add a new scope---a new blank sheet ---on the left, and we look for `struct a' on both left-hand pages (because this pointer refers to struct a, i.e., this is a reference to, not a definition of, `struct a'). It is not there, so we add an incomplete definition, writing struct a<0> on the right and copying it to the left. (We still do not know what `struct a' is.) Then we take the open brace, which adds a new scope, so we put a third sheet of paper on the left. Now we come across a definition for a `struct a'---but there is no `struct a' on the top page on the left, so we add a new one on the right: struct a<1> and copy that to the left. The result is that `p' points to a `struct a<0>' but the only `struct a' we know about is a `struct a<1>'. p is thus largely useless. When we reach the final `}' closing function f, we throw away the top two left-hand sheets, going back to our blank one, and so if we declare another `struct a' it is a `struct a<2>'. This works the same whether f is written as void f(p) struct a *p; { or void f(struct a *p) { or even void f(struct a *p); Now consider what happens if we have: struct a; void f(struct a *p); This time we write `struct a<0>' on the right and copy to the left before we add any more sheets of paper. This gives us an incomplete declaration of the structure `a'. Next, we put down a new blank sheet. We then see a reference to `struct a'. We search through both sheets on the left and, voila!, find `struct a<0>'. p thus points to a `struct a<0>'. The level-1 scope (top page) disappears after the second semicolon, and if we encounter a struct a { ... }; definition, this `fleshes out' the struct a<0>. [Now that I have done all this, it occurs to me that it might be simpler to tag each declaration with its scope level, rather than a global unique number, at least for discussion. The BSD debugging symbol table format uses global unique numbers, which is why I did it this way. The initial number is not 0, however; the first few numbers are assigned to the `base types'. This is what the strings in a `.stabs' directive are all about.] -- In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427) Berkeley, CA Domain: torek@ee.lbl.gov