Xref: utzoo comp.sys.ibm.pc:22742 comp.sys.intel:638
Path: utzoo!attcan!uunet!lll-winken!lll-lcc!ames!oliveb!intelca!mipos3!pinkas
From: pinkas@hobbit.intel.com (Israel Pinkas ~)
Newsgroups: comp.sys.ibm.pc,comp.sys.intel
Subject: Re: correct code for pointer subtraction
Message-ID: <PINKAS.89Jan3082456@hobbit.intel.com>
Date: 3 Jan 89 16:24:56 GMT
References: <597@mks.UUCP> <3845@pt.cs.cmu.edu> <18123@santra.UUCP> <142@bms-at.UUCP> <6604@killer.DALLAS.TX.US>
Sender: news@mipos3.intel.com
Organization: Corporate CAD, INTeL Corporation, Santa Clara, CA
Lines: 104
In-reply-to: chasm@killer.DALLAS.TX.US's message of 31 Dec 88 02:25:58 GMT


Please note:  I am on my own here.  I work for Intel, but do not speak for
them.


In article <6604@killer.DALLAS.TX.US> chasm@killer.DALLAS.TX.US (Charles Marslett) writes:

>  In article <142@bms-at.UUCP>, stuart@bms-at.UUCP (Stuart Gathman) writes:
>   > In article <18123@santra.UUCP>, tarvaine@tukki.jyu.fi (Tapani Tarvainen) writes:
>   > 
>   > > The same error occurs in the following program 
>   > > (with Turbo C 2.0 as well as MSC 5.0):
>   > 
>   > > main()
>   > > {
>   > >         static int a[30000];
>   > >         printf("%d\n",&a[30000]-a);
>   > > }
>   > 
>   > > output:  -2768
>   > 
>   > This is entirely correct.  The difference of two pointers is an *int*.

>  And unless you have a 15-bit computer, 30000 is a very representable *INT*,
>  so please pay attention to the discussion before asserting something.  The
>  compiler is generating a VERY WRONG ANSWER.

The compiler is generating a correct answer.  There is an overflow in
there.  Remember, on a PC, ints are two bytes.  Let's ignore the fact that
there is no a[30000] (and that taking its address is invalid).  a[30000] is
offset from a[0] by 60,000 bytes.  The normal code for pointer subraction
is to subtract the pointers and divide by the size of the object.  Since
many objects have a size that is a power of two, the compiler can often
optime by using a shift.

However, since the difference is stored in an int, 60,000 is taken to be a
negaive number (-5536).  Dividing this by two (the size of int) gives -2768.

Note that there are similar problems when adding an int to a pointer.  They
just don't show up as often, as the pointer is treated specially.

There is no real solution for this.  You might get the desired result with
the following code:

	main()
	{
	    int a[30000];
	    printf("%ld\n", (long) (&a[30000] - a));
	}

Of course, you could always just subtract the two indecies.  You could also
conver the two pointers to (char *), subtract and convert to unsigned, and
divide by sizeof(object).

>   > If you want an unsigned difference, you need to cast to unsigned
>   > (and/or use %u in the printf).  If the difference were defined as
>   > unsigned, how would you indicate negative differences?  If you
>   > make the difference long, all the related arithmetic gets promoted
>   > also for a big performance hit.  The solution is simple, if you
>   > want an unsigned ptrdiff, cast or assign to unsigned.

>  The result cast to an unsigned is 62768, still not even close to the
>  correct value of 30000.  There are two viable solutions:  you can write
>  your own assembly language (or C code, even) to calculate the proper result
>  or you can ignore the issue and assume the size of a segment on the Intel
>  architecture is 32K.  I have used both solutions.

Treating it as an unsigned is wrong, as the next time through you might
want to know a - &a[30000].

This problem is not inherent in the fact that the 80x86 uses segments.  It
is a result of the fact the sizeof(int *) > sizeof(int).  (Actually, the
aize of the lagrst pointer type.)  Since the only class of compilers that
have this problem are the MS-DOS compilers, this is why people blame the
issue on Intel.  The only limitation that the 8088/8086 segments impose is
with respect to the size of an object, either code or data.  It takes more
effort to manipulate an object that is greater than 64K, and a compiler
would have to be very intelligent toi generate code for a single procedure
that was >64K.

>   > Don't flame the 8086 either.  The same thing happens in 32-bit machines
>   > (just much less often).  16 bits is 16 bits, and segments are not
>   > the problem.  The VAX restricts user programs to 31-bit address space
>   > to avoid this.

>  Actually, in a 32-bit machine the problem is probably more serious if
>  we assume a real 32-bit address, since it may well not support 33+ bit
>  arithmetic even as well as Intel boxes support 17+ bit arithmetic.

On machines where sizeof(int) >= sizeof(int *), this is never a problem.
On the VAX, 68K, Sparc, 80386, and most other machines that I have worked
with, ints are 32 bits.  Since most machines do not have 2G of virtual
memory, the issue never comes up.

-Israel
--
--------------------------------------
Disclaimer: The above are my personal opinions, and in no way represent
the opinions of Intel Corporation.  In no way should the above be taken
to be a statement of Intel.

UUCP:	{amdcad,decwrl,hplabs,oliveb,pur-ee,qantel}!intelca!mipos3!cad001!pinkas
ARPA:	pinkas%cad001.intel.com@relay.cs.net
CSNET:	pinkas@cad001.intel.com