Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!caen!hellgate.utah.edu!dog.ee.lbl.gov!elf.ee.lbl.gov!torek From: torek@elf.ee.lbl.gov (Chris Torek) Newsgroups: comp.lang.c Subject: Re: strcmp Message-ID: <14498@dog.ee.lbl.gov> Date: 19 Jun 91 19:33:22 GMT References: <2695@m1.cs.man.ac.uk> <1991Jun18.074029.12226@panix.uucp> <1991Jun18.153653.1494@zoo.toronto.edu> <14421@dog.ee.lbl.gov> Reply-To: torek@elf.ee.lbl.gov (Chris Torek) Organization: Lawrence Berkeley Laboratory, Berkeley Lines: 58 X-Local-Date: Wed, 19 Jun 91 12:33:22 PDT In article <14421@dog.ee.lbl.gov> I wrote: > int strcmp(char *s1, char *s2) { This should be int strcmp(const char *s1, const char *s2) { Thanks to Matthew Farwell for pointing this out. > return (*(unsigned char *)--s1 - *(unsigned char *)s2); >... this assumes that the subtraction will not overflow. The word `overflow' is wrong. The subtraction cannot overflow. There are several cases: 1. K&R C: `unsigned char's widen to `unsigned int's. Then the subtraction is done in unsigned arithmetic, which is modular arithmetic in 2^(number of bits). 2. ANSI C: 2a: sizeof(char) < sizeof(int). Then `unsigned char's widen to signed `int's that are nonetheless nonnegative. If the underlying system is 2's complement, 1's complement, or sign-magnitude, the subtraction will not overflow. (If it is something else, I am not sure what happens. With any luck, it does not overflow.) 2b: sizeof(char) == sizeof(int). Then `unsigned char's widen to `unsigned int's, and the subtraction is done in unsigned arithmetic, just as in K&R C. (Note the complications in 2a, due to the so-called `value preserving' rules, which are Wrong, but are now Engraved in Stone. Oh well.) The complication I was onsidering in <14421@dog.ee.lbl.gov> occurs in case 2b. Suppose, for instance, that char and int are both 16 bits, and that we have two strings made up of characters (32761,0) and (1,0) respectively. Then the comparison will return (int)((unsigned char)32761 - (unsigned char)1) (the `int' cast is provided by the `return'). This will be equal to (int)((unsigned)32761 - (unsigned)1) or (int)((unsigned)32760) or, in 2's complement, -8. A return value of -8 says that s1 < s2, yet 32761 > 1. This is what I called `overflow' above. I am not sure *what* to call it, but `overflow' is wrong. Thanks to Lasse H. Ostergaard for noting this. -- In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427) Berkeley, CA Domain: torek@ee.lbl.gov