Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!csd4.milw.wisc.edu!uxc!garcon!uicsrd.csrd.uiuc.edu!mcdaniel From: mcdaniel@uicsrd.csrd.uiuc.edu (Tim McDaniel) Newsgroups: comp.lang.c Subject: Adding two pointers Summary: Long, about 150 lines Message-ID: <922@garcon.cso.uiuc.edu> Date: 6 May 89 06:57:49 GMT References: <2765@buengc.BU.EDU> <563@lzaz.ATT.COM> <4093@ficc.uu.net> Sender: news@garcon.cso.uiuc.edu Reply-To: mcdaniel@uicsrd.csrd.uiuc.edu (Tim McDaniel) Organization: Center for Supercomputing R&D (Cedar), U. of Ill. Lines: 173 In this article, I give two possible definitions of pointer addition. I then indicate that these definitions cannot be implemented efficiently on most architectures. I show that Peter de Silva's card analogy is misleading. At the end, I question the need for pointer addition, and assign extra-credit problems for the interested student. :-) In article <4093@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes: >In article <563@lzaz.ATT.COM>, hutch@lzaz.ATT.COM (R.HUTCHISON) writes: >> midpoint_pointer = (start_pointer + end_pointer) / 2; >You're right. It's a valid operation. At most, it's "valid" only in some "human semantic" sense. As has been pointed out before, it's not "valid" in the C language at present. How in creation do you define pointer addition? One possible definition is this: 'For pointers P and Q of the same type, both non-null and [WLOG] pointing into the same array, the value of P+Q is defined to be the bit pattern which is the unsigned integral sum of the bit patterns of P and Q.' This is a bad definition, because pANS C quite rightly says nothing about "bit patterns", particularly of pointers, but I'll assume that you know what I mean. Can we just add P and Q's internal bit representation as unsigned integers? No -- even one addition can overflow. For example, suppose that all machine addresses are in the range 0 to 255. Let Start be at 200, and End be at 220 (both are valid addresses). Suppose we want to compute "Total = Start + End;". Start + End = 200 + 220 = "420", which cannot be represented. Assume the result is mod 256 (a la unsigned ops in pANS C) Total is "420" % 256 = 164. So Total/2, which by Peter's reasoning should be the average of Start and End, is 82 -- NOT what was intended. (If you prefer addresses to be -128 to 127, consider Start = 72 and End = 92. The same problem occurs, so I consider only unsigned addition here.) If overflow can occur for just one addition of two valid addresses, there's not much use for the concept. How would you fix this problem? If all data addresses (due to the stack, malloc, or static data) are in the lower half of memory addresses, overflow can't occur in one addition. In this case, though, the stack has to grow upward -- suddenly, the idea of adding pointers ripples all the way back to the hardware designer! Even if data be restricted to low memory (and note that some current implementations put data in high memory), a statement like Total = Loc1 + Loc2 + Loc3; can still cause overflow. If you choose any fixed-size representation for pointers, I can add as many addresses as I like and cause overflow. Perhaps, just like for other data (integers, floating-point), it might be the user's responsibility to avoid overflow. In that case, I have to use Average = (End - Start) / 2 + Start; anyway, or the equivalent for the three-Loc case, and there's no use for pointer addition. Here's another possible definition, more in line with pANS C concepts. Maybe you think it's too restrictive, but I intend to show that even this restrictive definition can't be implemented efficiently. 'Two pointers P and Q may be added if they are of the same type and point into the same array A. If P points at the Ith element of A, and Q points at the Jth element of A, then P+Q has the same type as P and Q, and P+Q points to the (I+J)th element of A, if there is one. Under any other conditions, the addition is undefined.' This has different effects from the previous definition. If P=Q=&Array[0], here P+Q=P=Q. Before, P+Q=P only if there's overflow or if Q's internal bit-pattern representation is all zeroes. Consider an implementation of this. Let Start and End be pointers to elements an array whose first address is Array. If we want to compute "Total = Start + End;", we can compute it something like this: Offset1 = Start - Array; Offset2 = End - Array; Total = Array + Offset1 + Offset2; or just Total = Start + End - Array; If it overflows, the result should be undefined anyway. But how do we know what Array is, given just Start or End? Start and End may have been computed in one function, and passed to a function in a separate file. The only way to know the value of Array is if all pointers are implemented as pairs "(pointer,base)", like "(Start,Array)" and "(End,Array)". Then all pointers are twice as long as they used to be. They take up twice as much space on the stack as arguments. They probably take twice as long to manipulate. One of the uses of pointers is to move around in arrays. But if pointers have to be pairs of addresses, we should use subscripts instead, for many architectures. Pointer arithmetic is generally useful only for moving within arrays. If subscripting is more efficient, why have pointer arithmetic at all? Are there any other possible implementations? Here's how Peter da Silva's card analogy fails: >What's the average of the 7 of clubs and the 9 of clubs? Overflow: if you add them, you get the 16 of clubs, which does not exist. Peter carried more precision in the intermediate results than is permitted in the representation. Knowledge of the base: Peter knows that the low card is the 2 (or ace, if you like). With a bare pointer, all we know is that we're looking at the (Querg) of clubs and the (Querg+2) of clubs. In short, we either have to make each suit arbitrarily large, or we can add only the low cards, or we have to know the low card in each suit, or we have to avoid addition altogether. One question has not yet been answered, and it makes this whole fiasco moot. What additional power does pointer addition give? What can be done with it that can't be done about as easily as with pointer subtraction? The example given, taking an average, is not such a case. To find the point x% of the way between Start and End, you could use x*Start + (1-x)*End (reducing to (Start+End)/2 in the special case x=50%), but you can also use x*(End-Start) + Start which uses only pointer subtraction, is more efficient in the general case, and takes only one extra add in the x=50% case. Are there any other applications in which pointer addition is useful? For extra credit: :-) Suppose all addresses on the machine in question are integers in the range 0 to 255 (like a small VAX), and all integral types (signed and unsigned) take exactly 8 bits. 1. Give a simple implementation of pointer subtraction as defined in pANS C, for P and Q pointers to some type T, ignoring overflow. 2. The difference of two pointers must be storable in some signed integral type. Overflow is not permitted. Give a simple condition on the implementation that prevents overflow. (Hint: If we have an array in memory, "char A[201];", what are the values of "&A[200] - &A[0]" and "&A[0] - &A[200]"? What does this imply about the declaration of A?) 3. I do a little sleight-of-hand above. Peter de Silva's original example was midpoint_pointer = (start_pointer + end_pointer) / 2; but my example was Total = Start + End; How does pANS C justify my substitution? 4. What does "WLOG" mean? -- Tim, the Bizarre and Oddly-Dressed Enchanter Center for ||| Internet, BITNET: mcdaniel@uicsrd.csrd.uiuc.edu Supercomputing ||| UUCP: {uunet,convex,pur-ee}!uiucuxc!uicsrd!mcdaniel Research and ||| ARPANET: mcdaniel%uicsrd@uxc.cso.uiuc.edu Development, ||| CSNET: mcdaniel%uicsrd@uiuc.csnet U of Illinois ||| DECnet: GARCON::"mcdaniel@uicsrd.csrd.uiuc.edu"