Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!csd4.milw.wisc.edu!uxc!garcon!uicsrd.csrd.uiuc.edu!mcdaniel
From: mcdaniel@uicsrd.csrd.uiuc.edu (Tim McDaniel)
Newsgroups: comp.lang.c
Subject: Adding two pointers
Summary: Long, about 150 lines
Message-ID: <922@garcon.cso.uiuc.edu>
Date: 6 May 89 06:57:49 GMT
References: <2765@buengc.BU.EDU> <563@lzaz.ATT.COM> <4093@ficc.uu.net>
Sender: news@garcon.cso.uiuc.edu
Reply-To: mcdaniel@uicsrd.csrd.uiuc.edu (Tim McDaniel)
Organization: Center for Supercomputing R&D (Cedar), U. of Ill.
Lines: 173

In this article, I give two possible definitions of pointer addition.
I then indicate that these definitions cannot be implemented
efficiently on most architectures.  I show that Peter de Silva's card
analogy is misleading.  At the end, I question the need for pointer
addition, and assign extra-credit problems for the interested student.
:-)


In article <4093@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes:
>In article <563@lzaz.ATT.COM>, hutch@lzaz.ATT.COM (R.HUTCHISON) writes:
>> 	midpoint_pointer = (start_pointer + end_pointer) / 2;
>You're right. It's a valid operation.

At most, it's "valid" only in some "human semantic" sense.  As has
been pointed out before, it's not "valid" in the C language at
present.


How in creation do you define pointer addition?

One possible definition is this: 'For pointers P and Q of the same
type, both non-null and [WLOG] pointing into the same array, the value
of P+Q is defined to be the bit pattern which is the unsigned integral
sum of the bit patterns of P and Q.'  This is a bad definition,
because pANS C quite rightly says nothing about "bit patterns",
particularly of pointers, but I'll assume that you know what I mean.

Can we just add P and Q's internal bit representation as unsigned
integers?  No -- even one addition can overflow.

For example, suppose that all machine addresses are in the range
0 to 255.  Let Start be at 200, and End be at 220 (both are valid
addresses).  Suppose we want to compute "Total = Start + End;".
        Start + End = 200 + 220 = "420", which cannot be represented.
        Assume the result is mod 256 (a la unsigned ops in pANS C)
        Total is "420" % 256 = 164.
So Total/2, which by Peter's reasoning should be the average of Start
and End, is 82 -- NOT what was intended.  (If you prefer addresses to
be -128 to 127, consider Start = 72 and End = 92.  The same problem
occurs, so I consider only unsigned addition here.)

If overflow can occur for just one addition of two valid addresses,
there's not much use for the concept.

How would you fix this problem?  If all data addresses (due to the
stack, malloc, or static data) are in the lower half of memory
addresses, overflow can't occur in one addition.  In this case,
though, the stack has to grow upward -- suddenly, the idea of adding
pointers ripples all the way back to the hardware designer!

Even if data be restricted to low memory (and note that some current
implementations put data in high memory), a statement like
        Total = Loc1 + Loc2 + Loc3;
can still cause overflow.  If you choose any fixed-size representation
for pointers, I can add as many addresses as I like and cause overflow.

Perhaps, just like for other data (integers, floating-point), it might
be the user's responsibility to avoid overflow.  In that case, I have
to use
        Average = (End - Start) / 2 + Start;
anyway, or the equivalent for the three-Loc case, and there's no use
for pointer addition.


Here's another possible definition, more in line with pANS C concepts.
Maybe you think it's too restrictive, but I intend to show that even
this restrictive definition can't be implemented efficiently.

'Two pointers P and Q may be added if they are of the same type and
point into the same array A.  If P points at the Ith element of A, and
Q points at the Jth element of A, then P+Q has the same type as P and
Q, and P+Q points to the (I+J)th element of A, if there is one.  Under
any other conditions, the addition is undefined.'

This has different effects from the previous definition.  If
P=Q=&Array[0], here P+Q=P=Q.  Before, P+Q=P only if there's overflow
or if Q's internal bit-pattern representation is all zeroes.

Consider an implementation of this.  Let Start and End be pointers to
elements an array whose first address is Array.  If we want to compute
"Total = Start + End;", we can compute it something like this:
        Offset1 = Start - Array;
        Offset2 = End   - Array;
        Total = Array + Offset1 + Offset2;
or just
        Total = Start + End - Array;
If it overflows, the result should be undefined anyway.

But how do we know what Array is, given just Start or End?  Start and
End may have been computed in one function, and passed to a function
in a separate file.  The only way to know the value of Array is if all
pointers are implemented as pairs "(pointer,base)", like
"(Start,Array)" and "(End,Array)".

Then all pointers are twice as long as they used to be.  They take up
twice as much space on the stack as arguments.  They probably take
twice as long to manipulate.

One of the uses of pointers is to move around in arrays.  But if
pointers have to be pairs of addresses, we should use subscripts
instead, for many architectures.  Pointer arithmetic is generally
useful only for moving within arrays.  If subscripting is more
efficient, why have pointer arithmetic at all?

Are there any other possible implementations?


Here's how Peter da Silva's card analogy fails:
>What's the average of the 7 of clubs and the 9 of clubs?

Overflow: if you add them, you get the 16 of clubs, which does not
exist.  Peter carried more precision in the intermediate results than
is permitted in the representation.

Knowledge of the base: Peter knows that the low card is the 2 (or ace,
if you like).  With a bare pointer, all we know is that we're looking
at the (Querg) of clubs and the (Querg+2) of clubs.

In short, we either have to make each suit arbitrarily large, or we
can add only the low cards, or we have to know the low card in each
suit, or we have to avoid addition altogether.


One question has not yet been answered, and it makes this whole fiasco
moot.  What additional power does pointer addition give?  What can be
done with it that can't be done about as easily as with pointer
subtraction?  The example given, taking an average, is not such a
case.  To find the point x% of the way between Start and End, you
could use
        x*Start + (1-x)*End
(reducing to (Start+End)/2 in the special case x=50%), but you can
also use
        x*(End-Start) + Start
which uses only pointer subtraction, is more efficient in the general
case, and takes only one extra add in the x=50% case.

Are there any other applications in which pointer addition is useful?


For extra credit: :-)

Suppose all addresses on the machine in question are integers in the
range 0 to 255 (like a small VAX), and all integral types (signed and
unsigned) take exactly 8 bits.

1. Give a simple implementation of pointer subtraction as defined in
pANS C, for P and Q pointers to some type T, ignoring overflow.

2. The difference of two pointers must be storable in some signed
integral type.  Overflow is not permitted.  Give a simple condition on
the implementation that prevents overflow.  (Hint: If we have an array
in memory, "char A[201];", what are the values of "&A[200] - &A[0]"
and "&A[0] - &A[200]"?  What does this imply about the declaration of
A?)

3. I do a little sleight-of-hand above.  Peter de Silva's original
example was
 	midpoint_pointer = (start_pointer + end_pointer) / 2;
but my example was
        Total = Start + End;
How does pANS C justify my substitution?

4. What does "WLOG" mean?

--

             Tim, the Bizarre and Oddly-Dressed Enchanter

Center for      |||  Internet, BITNET:  mcdaniel@uicsrd.csrd.uiuc.edu
Supercomputing  |||  UUCP:     {uunet,convex,pur-ee}!uiucuxc!uicsrd!mcdaniel
Research and    |||  ARPANET:  mcdaniel%uicsrd@uxc.cso.uiuc.edu
Development,    |||  CSNET:    mcdaniel%uicsrd@uiuc.csnet
U of Illinois   |||  DECnet:   GARCON::"mcdaniel@uicsrd.csrd.uiuc.edu"