Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!caen!hellgate.utah.edu!dog.ee.lbl.gov!elf.ee.lbl.gov!torek
From: torek@elf.ee.lbl.gov (Chris Torek)
Newsgroups: comp.lang.c
Subject: Re: strcmp
Message-ID: <14498@dog.ee.lbl.gov>
Date: 19 Jun 91 19:33:22 GMT
References: <2695@m1.cs.man.ac.uk> <1991Jun18.074029.12226@panix.uucp> <1991Jun18.153653.1494@zoo.toronto.edu> <14421@dog.ee.lbl.gov>
Reply-To: torek@elf.ee.lbl.gov (Chris Torek)
Organization: Lawrence Berkeley Laboratory, Berkeley
Lines: 58
X-Local-Date: Wed, 19 Jun 91 12:33:22 PDT

In article <14421@dog.ee.lbl.gov> I wrote:
>	int strcmp(char *s1, char *s2) {

This should be

	int strcmp(const char *s1, const char *s2) {

Thanks to Matthew Farwell <dylan@ibmpcug.co.uk> for pointing this out.

>		return (*(unsigned char *)--s1 - *(unsigned char *)s2);
>... this assumes that the subtraction will not overflow.

The word `overflow' is wrong.  The subtraction cannot overflow.  There
are several cases:

	1. K&R C: `unsigned char's widen to `unsigned int's.
	   Then the subtraction is done in unsigned arithmetic,
	   which is modular arithmetic in 2^(number of bits).

	2. ANSI C:
	   2a:	sizeof(char) < sizeof(int).  Then `unsigned char's
		widen to signed `int's that are nonetheless
		nonnegative.  If the underlying system is 2's
		complement, 1's complement, or sign-magnitude, the
		subtraction will not overflow.  (If it is something
		else, I am not sure what happens.  With any luck,
		it does not overflow.)

	   2b:	sizeof(char) == sizeof(int).  Then `unsigned char's
		widen to `unsigned int's, and the subtraction is
		done in unsigned arithmetic, just as in K&R C.

(Note the complications in 2a, due to the so-called `value preserving'
rules, which are Wrong, but are now Engraved in Stone.  Oh well.)

The complication I was onsidering in <14421@dog.ee.lbl.gov> occurs
in case 2b.  Suppose, for instance, that char and int are both 16
bits, and that we have two strings made up of characters (32761,0)
and (1,0) respectively.  Then the comparison will return

	(int)((unsigned char)32761 - (unsigned char)1)

(the `int' cast is provided by the `return').  This will be equal to

	(int)((unsigned)32761 - (unsigned)1)

or

	(int)((unsigned)32760)

or, in 2's complement, -8.  A return value of -8 says that s1 < s2, yet
32761 > 1.  This is what I called `overflow' above.  I am not sure *what*
to call it, but `overflow' is wrong.

Thanks to Lasse H. Ostergaard <lhoe@id.dth.dk> for noting this.
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov