Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!uwvax!tank!uxc.cso.uiuc.edu!garcon!uicsrd.csrd.uiuc.edu!mcdaniel From: mcdaniel@uicsrd.csrd.uiuc.edu (Tim McDaniel) Newsgroups: comp.lang.c Subject: Re: A nice macro Summary: The problem is address overflow Keywords: macros, arrays, addresses, overflow Message-ID: <1330@garcon.cso.uiuc.edu> Date: 22 Jun 89 08:06:48 GMT References: <2784@solo8.cs.vu.nl> Sender: news@garcon.cso.uiuc.edu Reply-To: mcdaniel@uicsrd.csrd.uiuc.edu (Tim McDaniel) Organization: Center for Supercomputing R&D (Cedar), U. of Ill. Lines: 145 In article <2784@solo8.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes: > An often-heard complaint by Pascal dweebs on C is the absence of the > equivalence of > VAR foo: array[-5..-2] of bar; I am by no means a "Pascal dweeb", because I don't particularly care for the language. However, there are many problems in which it is natural to have the subscript range of an array be other than 0..N-1. (But, presumably, Maarten meant "one is a Pascal dweeb => one tends to complain about C subscripts" rather than "one tends to complain about C subscripts => one is a Pascal dweeb"). One application for non-0-based-subscripting is, in fact, > in the MINIX kernel's `proc' table user processes have positive > indices, while kernel tasks have negative. > #define HIGH -2 > #define LOW -5 > bar foo[HIGH - LOW + 1]; > #define foo_addr(n) &foo[(n) - LOW] > > By this scheme every `zork(n)' might be an array reference instead > of a function call/function-like macro invocation. :-( Not an array "reference", whatever that means, but an expression evaluating to a pointer (into an array). It is certainly permitted in C to have a function or a macro return a pointer, as in: *foo_addr(3) = 15; It might be disconcerting, as you note, to see zork(5) = 20; in a C program. > I doubt Chris was the person who suggested this; the solution below > seems so straightforward: The keyword here is "seems". If you ever think that Chris Torek has made a mistake, it is prudent to think again. The odds say that you made the mistake. > bar _foo[HIGH - LOW + 1]; /* #1 */ > #define foo (_foo - LOW) /* #2 */ [with the example] > foo[-4] == (_foo - -5)[-4] == *((_foo + 5) - 4) == /* #3 */ > *(_foo + 1) == _foo[1] Objection 1: in pANS C, many identifiers starting with "_" are reserved for the implementation. Unfortunately, I don't have the rules handy, so I can't tell the circumstances under which this declaration would be legal. If "_foo" is extern, I'm pretty sure it is illegal. "real_foo" would be a better choice. Objection 2 (minor): with this #define, constructs like a = func foo; would be syntactically legal, and foo = 10; would give a confusing error message. I might prefer #define foo &_foo[-LOW] which is (more or less) equivalent. Objection 3: the result of the computation is undefined in pANS C. > foo[-4] == (_foo - -5)[-4] == *((_foo + 5) - 4) OK so far, in a syntactic sense at least. > == *(_foo + 1) Wrong, at least in pANS C. pANS C is not permitted to rearrange expressions so as to ignore parentheses. "(a+b)+c" must be computed by adding a to b, and then adding the result to c.% Since HIGH-LOW+1 is 4, the declaration of _foo is bar _foo[4]; Elements _foo+0 through _foo+3 exist, and the address _foo+4 may be computed, but the expression given is *((_foo + 5) - 4) and the address _foo+5 is undefined. In particular, an implementation is permitted to abort the program or generate a random address. Under what conditions is Maarten's scheme guaranteed to work? (From here on, assume there's no integer overflow.) First, the declaration bar id[HIGH - LOW + 1]; must be legal. Since C does not allow 0-sized arrays (currently), HIGH - LOW + 1 >= 1 so HIGH - LOW >= 0 or HIGH >= LOW The other conditions are derived from the requirement that #define foo (id - LOW) generate a valid address. The valid addresses from id are id+0 through id+(HIGH-LOW+1) inclusive, so the first condition is id - LOW >= id + 0 hence -LOW >= 0 or LOW <= 0 and the second condition is id - LOW <= id + HIGH - LOW + 1 hence -LOW <= HIGH + 1 + (-LOW) or 0 <= HIGH + 1 or HIGH >= -1 So the three preconditions for Maarten's scheme to be guaranteed to work are HIGH >= LOW LOW <= 0 HIGH >= -1 Maarten's second example fails the third constraint, and thus is not portable under pANS C. (His first example, about the MINIX process table, would work.) In fact, almost all architectures will do it "right", but that's no consolation when you try to port to the odd one. The first #define, attributed to Chris Torek, > #define foo_addr(n) &foo[(n) - LOW] might have a similar problem. "a[b]" was defined by K&R to be identical to "*(a+b)", so &foo[(n) - LOW] <==> (foo + (n) - LOW) I don't know whether pANS C specifies the same identity,# or what it says about evaluating expressions without parentheses. If compilers are allowed to rearrange, a compiler might instead compute (foo - LOW + (n)) which would have the same overflow conditions as Maarten's scheme. With extra parentheses, any LOW and HIGH pair may be used when LOW <= HIGH (if there's no integer overflow): > #define foo_addr(n) (&(foo)[ ((n)-(LOW)) ]) There's a general rule of thumb: in the right-hand side of a macro definition, parenthesize everything, even where you think parentheses are unnecessary. Here's yet one more example of where the rule of thumb is useful. % Actually, there's the "as if" rule, which permits an implementation to do anything it wants as long as the result derived is correct for all cases in which pANS C defines an answer. For example, if "a+b" would overflow, the value of "(a+b)+c" is not defined under pANS C, and an implementation may do whatever it likes. It might produce, in this case, "a+(b+c)", which might happen to be the numerically correct answer. Or it might send 110 volts at 100 amps AC through your chair. # Actually, I'm implicitly assuming another identity, that &*x == x for any address x. I don't know about this identity under pANS C, either.