Path: utzoo!attcan!uunet!lll-winken!lll-lcc!ames!mailrus!iuvax!rutgers!njin!princeton!phoenix!rjchen From: rjchen@phoenix.Princeton.EDU (Raymond Juimong Chen) Newsgroups: comp.sys.ibm.pc Subject: Re: correct code for pointer subtraction Message-ID: <5137@phoenix.Princeton.EDU> Date: 4 Jan 89 22:22:44 GMT References: Organization: Princeton University, NJ Lines: 113 I really tried to stay out of this, but I want to clarify some things that people seem to be missing, as well as provide a counterexample to an assertion made. > The compiler is generating a correct answer. Who is defining "correct"? If you want K&R to be the final word on the "definition" of the C language, then I'm afraid that the compiler is not generating a "correct" answer. K&R (old version) page 98 explicitly states "p-q is the numebr of elements between p and q". And in section 7.4, "If two pointers to objects of the same type are subtracted, the result is ... an int representing the number of objects separating the pointed-to objects." I suspect that the ANSI draft says the same thing. Therefore, the "correct" answer is 30000. I shall henceforth place the word "correct" in quotation marks to emphasize that what I am referring to is the value that should be returned, as defined by K&R. I am not insisting that this is the most reasonable result, merely that it is what K&R decrees the answer should be. > There is an overflow in there. Exactly. And the compiler should be smart enough to emit code to handle the overflow so as to generate the "correct" answer. > However, since the difference is stored in an int, 60,000 is taken to be a > negaive number (-5536). Dividing this by two (the size of int) gives -2768. It seems you are simply giving a rational explanation for the answer; justifying why the answer of -2768 is not unexpected, but not justifying its correctness. (As stated above, the "correct" answer is DEFINED to be 30000.) The additional code is really very simply. If the carry is clear, then the diference is positive, so we just perform a UNSIGNED division on the resulting difference. If the carry is set, then the difference is negative, and the accumulator contains the two-s complement of the absolulte value of the byte difference. So we negate the accumulator, perform an unsigned division, then negate the result. As the counterexample below illustrates, this will not work if the object is only one byte wide, since an int cannot hold all the possible pointer differences. > There is no real solution for this. You might get the desired result with > the following code: > > main() > { > int a[30000]; > printf("%ld\n", (long) (&a[30000] - a)); > } But I suspect most people would argue that the original code (which used the line printf("%ld\n", &a[30000] - a); ) is ANSI-conformant, and therefore should yield the "correct" answer. > This problem is not inherent in the fact that the 80x86 uses segments. Agreed. > It > is a result of the fact the sizeof(int *) > sizeof(int). (Actually, the > aize of the lagrst pointer type.) Untrue. Even if sizeof(int *) == sizeof(int), we would still have problems. Suppose sizeof(int) == sizeof(int *) == 4. And suppose you have an array of chars that takes up 3/4 of the computer's memory. Let p be a pointer to the start of the array, and q a pointer to the end of the array. Let M be 2^sizeof(int), so that an unsigned can represent the values 0 ... M-1, and an int can represent (if using two's complement) -(M/2) ... M/2 - 1. And the computer has M bytes of memory. Let d be the value (0.75 * M). Then the following possible "correct" return values exist: q - p == d p - q == -d So we have to be able to represent 2d-1 different values. But 2d-1 > M. Therefore, you cannot handle all possible return values with just an int, since an int can represent only M possible values. > On machines where sizeof(int) >= sizeof(int *), this is never a problem. See above counterexample. As mentioned many times before, you need your int to be at least one bit bigger than your pointers. This becomes clear when you realize that if an array gobbles up all but one byte of memory, then the difference betwen its head and its tail will be +-(M-1), and you clearly require (lg M)+1 bits to represent that many different values. ------------------------- Let me say that I understand why the implementors of compilers have not bothered to worry about this far-out case: It makes your code larger (since you have to perform special tests for the carry bit) to handle cases that don't pop up very often. Since benchmarks compare both speed and size, I can see the following exchange: (P == a programmer on the compiler development team. M = an executive in the marketing department.) P: Well, we need to add this extra code to handle this special case which I admit doesn't happen very often. M: Does this special case occur in any of the standard benchmarks? P: No. M: Will it hurt our showing in the standard benchmarks? P: Yes. M: Leave it out. ------------------------ Respectfully, -- Raymond Chen UUCP: ...allegra!princeton!{phoenix|pucc}!rjchen BITNET: rjchen@phoenix.UUCP, rjchen@pucc ARPA: rjchen@phoenix.PRINCETON.EDU, rjchen@pucc.PRINCETON.EDU "Say something, please! ('Yes' would be best.)" - The Doctor