Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!uwvax!tank!uxc.cso.uiuc.edu!garcon!uicsrd.csrd.uiuc.edu!mcdaniel
From: mcdaniel@uicsrd.csrd.uiuc.edu (Tim McDaniel)
Newsgroups: comp.lang.c
Subject: Re: A nice macro
Summary: The problem is address overflow
Keywords: macros, arrays, addresses, overflow
Message-ID: <1330@garcon.cso.uiuc.edu>
Date: 22 Jun 89 08:06:48 GMT
References: <2784@solo8.cs.vu.nl>
Sender: news@garcon.cso.uiuc.edu
Reply-To: mcdaniel@uicsrd.csrd.uiuc.edu (Tim McDaniel)
Organization: Center for Supercomputing R&D (Cedar), U. of Ill.
Lines: 145

In article <2784@solo8.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes:
> An often-heard complaint by Pascal dweebs on C is the absence of the
> equivalence of
>	VAR foo: array[-5..-2] of bar;

I am by no means a "Pascal dweeb", because I don't particularly care
for the language.  However, there are many problems in which it is
natural to have the subscript range of an array be other than 0..N-1.
(But, presumably, Maarten meant
    "one is a Pascal dweeb => one tends to complain about C subscripts"
rather than
    "one tends to complain about C subscripts => one is a Pascal dweeb").

One application for non-0-based-subscripting is, in fact,
> in the MINIX kernel's `proc' table user processes have positive
> indices, while kernel tasks have negative. 

>	#define		HIGH		-2
>	#define		LOW		-5
>	bar	foo[HIGH - LOW + 1];
>	#define		foo_addr(n)	&foo[(n) - LOW]
>
> By this scheme every `zork(n)' might be an array reference instead
> of a function call/function-like macro invocation. :-(

Not an array "reference", whatever that means, but an expression
evaluating to a pointer (into an array).  It is certainly permitted in
C to have a function or a macro return a pointer, as in:
        *foo_addr(3) = 15;
It might be disconcerting, as you note, to see
        zork(5) = 20;
in a C program.

> I doubt Chris was the person who suggested this; the solution below
> seems so straightforward:

The keyword here is "seems".  If you ever think that Chris Torek has
made a mistake, it is prudent to think again.  The odds say that you
made the mistake.

>	bar	_foo[HIGH - LOW + 1];                   /* #1 */
>	#define foo         (_foo - LOW)                /* #2 */
[with the example]
>	foo[-4] == (_foo - -5)[-4] == *((_foo + 5) - 4) == /* #3 */
>           *(_foo + 1) == _foo[1]

Objection 1: in pANS C, many identifiers starting with "_" are
reserved for the implementation.  Unfortunately, I don't have the
rules handy, so I can't tell the circumstances under which this
declaration would be legal.  If "_foo" is extern, I'm pretty sure it
is illegal.  "real_foo" would be a better choice.

Objection 2 (minor): with this #define, constructs like
        a = func foo;
would be syntactically legal, and
        foo = 10;
would give a confusing error message.  I might prefer
        #define foo     &_foo[-LOW]
which is (more or less) equivalent.

Objection 3: the result of the computation is undefined in pANS C.
>	foo[-4] == (_foo - -5)[-4] == *((_foo + 5) - 4)
OK so far, in a syntactic sense at least.
>           == *(_foo + 1)
Wrong, at least in pANS C.  pANS C is not permitted to rearrange
expressions so as to ignore parentheses.  "(a+b)+c" must be computed
by adding a to b, and then adding the result to c.%  Since HIGH-LOW+1
is 4, the declaration of _foo is
        bar _foo[4];
Elements _foo+0 through _foo+3 exist, and the address _foo+4 may be
computed, but the expression given is
        *((_foo + 5) - 4)
and the address _foo+5 is undefined.  In particular, an implementation
is permitted to abort the program or generate a random address.

Under what conditions is Maarten's scheme guaranteed to work?  (From
here on, assume there's no integer overflow.)

First, the declaration
        bar id[HIGH - LOW + 1];
must be legal.  Since C does not allow 0-sized arrays (currently),
        HIGH - LOW + 1 >= 1
so      HIGH - LOW >= 0
or      HIGH >= LOW

The other conditions are derived from the requirement that
	#define foo         (id - LOW)
generate a valid address.  The valid addresses from id are id+0
through id+(HIGH-LOW+1) inclusive, so the first condition is
        id - LOW >= id + 0
hence   -LOW >= 0
or      LOW <= 0

and the second condition is
        id - LOW <= id + HIGH - LOW + 1
hence   -LOW <= HIGH + 1 + (-LOW)
or      0 <= HIGH + 1
or      HIGH >= -1

So the three preconditions for Maarten's scheme to be guaranteed to
work are
        HIGH >= LOW
        LOW <= 0
        HIGH >= -1
Maarten's second example fails the third constraint, and thus is not
portable under pANS C.  (His first example, about the MINIX process
table, would work.)  In fact, almost all architectures will do it
"right", but that's no consolation when you try to port to the odd
one.

The first #define, attributed to Chris Torek,

>	#define		foo_addr(n)	&foo[(n) - LOW]

might have a similar problem.  "a[b]" was defined by K&R to be
identical to "*(a+b)", so
        &foo[(n) - LOW] <==> (foo + (n) - LOW)
I don't know whether pANS C specifies the same identity,# or what it
says about evaluating expressions without parentheses.  If compilers
are allowed to rearrange, a compiler might instead compute
         (foo - LOW + (n))
which would have the same overflow conditions as Maarten's scheme.
With extra parentheses, any LOW and HIGH pair may be used when
LOW <= HIGH (if there's no integer overflow):

>	#define		foo_addr(n)	(&(foo)[  ((n)-(LOW))  ])

There's a general rule of thumb: in the right-hand side of a macro
definition, parenthesize everything, even where you think parentheses
are unnecessary.  Here's yet one more example of where the rule of
thumb is useful.


% Actually, there's the "as if" rule, which permits an implementation
to do anything it wants as long as the result derived is correct for
all cases in which pANS C defines an answer.  For example, if "a+b"
would overflow, the value of "(a+b)+c" is not defined under pANS C,
and an implementation may do whatever it likes.  It might produce, in
this case, "a+(b+c)", which might happen to be the numerically correct
answer.  Or it might send 110 volts at 100 amps AC through your chair.

# Actually, I'm implicitly assuming another identity, that
        &*x == x
for any address x.  I don't know about this identity under pANS C,
either.