Xref: utzoo comp.sys.ibm.pc:22742 comp.sys.intel:638 Path: utzoo!attcan!uunet!lll-winken!lll-lcc!ames!oliveb!intelca!mipos3!pinkas From: pinkas@hobbit.intel.com (Israel Pinkas ~) Newsgroups: comp.sys.ibm.pc,comp.sys.intel Subject: Re: correct code for pointer subtraction Message-ID: Date: 3 Jan 89 16:24:56 GMT References: <597@mks.UUCP> <3845@pt.cs.cmu.edu> <18123@santra.UUCP> <142@bms-at.UUCP> <6604@killer.DALLAS.TX.US> Sender: news@mipos3.intel.com Organization: Corporate CAD, INTeL Corporation, Santa Clara, CA Lines: 104 In-reply-to: chasm@killer.DALLAS.TX.US's message of 31 Dec 88 02:25:58 GMT Please note: I am on my own here. I work for Intel, but do not speak for them. In article <6604@killer.DALLAS.TX.US> chasm@killer.DALLAS.TX.US (Charles Marslett) writes: > In article <142@bms-at.UUCP>, stuart@bms-at.UUCP (Stuart Gathman) writes: > > In article <18123@santra.UUCP>, tarvaine@tukki.jyu.fi (Tapani Tarvainen) writes: > > > > > The same error occurs in the following program > > > (with Turbo C 2.0 as well as MSC 5.0): > > > > > main() > > > { > > > static int a[30000]; > > > printf("%d\n",&a[30000]-a); > > > } > > > > > output: -2768 > > > > This is entirely correct. The difference of two pointers is an *int*. > And unless you have a 15-bit computer, 30000 is a very representable *INT*, > so please pay attention to the discussion before asserting something. The > compiler is generating a VERY WRONG ANSWER. The compiler is generating a correct answer. There is an overflow in there. Remember, on a PC, ints are two bytes. Let's ignore the fact that there is no a[30000] (and that taking its address is invalid). a[30000] is offset from a[0] by 60,000 bytes. The normal code for pointer subraction is to subtract the pointers and divide by the size of the object. Since many objects have a size that is a power of two, the compiler can often optime by using a shift. However, since the difference is stored in an int, 60,000 is taken to be a negaive number (-5536). Dividing this by two (the size of int) gives -2768. Note that there are similar problems when adding an int to a pointer. They just don't show up as often, as the pointer is treated specially. There is no real solution for this. You might get the desired result with the following code: main() { int a[30000]; printf("%ld\n", (long) (&a[30000] - a)); } Of course, you could always just subtract the two indecies. You could also conver the two pointers to (char *), subtract and convert to unsigned, and divide by sizeof(object). > > If you want an unsigned difference, you need to cast to unsigned > > (and/or use %u in the printf). If the difference were defined as > > unsigned, how would you indicate negative differences? If you > > make the difference long, all the related arithmetic gets promoted > > also for a big performance hit. The solution is simple, if you > > want an unsigned ptrdiff, cast or assign to unsigned. > The result cast to an unsigned is 62768, still not even close to the > correct value of 30000. There are two viable solutions: you can write > your own assembly language (or C code, even) to calculate the proper result > or you can ignore the issue and assume the size of a segment on the Intel > architecture is 32K. I have used both solutions. Treating it as an unsigned is wrong, as the next time through you might want to know a - &a[30000]. This problem is not inherent in the fact that the 80x86 uses segments. It is a result of the fact the sizeof(int *) > sizeof(int). (Actually, the aize of the lagrst pointer type.) Since the only class of compilers that have this problem are the MS-DOS compilers, this is why people blame the issue on Intel. The only limitation that the 8088/8086 segments impose is with respect to the size of an object, either code or data. It takes more effort to manipulate an object that is greater than 64K, and a compiler would have to be very intelligent toi generate code for a single procedure that was >64K. > > Don't flame the 8086 either. The same thing happens in 32-bit machines > > (just much less often). 16 bits is 16 bits, and segments are not > > the problem. The VAX restricts user programs to 31-bit address space > > to avoid this. > Actually, in a 32-bit machine the problem is probably more serious if > we assume a real 32-bit address, since it may well not support 33+ bit > arithmetic even as well as Intel boxes support 17+ bit arithmetic. On machines where sizeof(int) >= sizeof(int *), this is never a problem. On the VAX, 68K, Sparc, 80386, and most other machines that I have worked with, ints are 32 bits. Since most machines do not have 2G of virtual memory, the issue never comes up. -Israel -- -------------------------------------- Disclaimer: The above are my personal opinions, and in no way represent the opinions of Intel Corporation. In no way should the above be taken to be a statement of Intel. UUCP: {amdcad,decwrl,hplabs,oliveb,pur-ee,qantel}!intelca!mipos3!cad001!pinkas ARPA: pinkas%cad001.intel.com@relay.cs.net CSNET: pinkas@cad001.intel.com