Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!henry From: henry@utzoo.UUCP (Henry Spencer) Newsgroups: net.bugs.v7,net.bugs.4bsd,net.unix-wizards Subject: bug in strcmp+strncmp Message-ID: <3270@utzoo.UUCP> Date: Thu, 20-Oct-83 17:57:03 EDT Article-I.D.: utzoo.3270 Posted: Thu Oct 20 17:57:03 1983 Date-Received: Thu, 20-Oct-83 17:57:03 EDT Organization: U of Toronto Zoology Lines: 41 There is a long-standing but obscure bug in strcmp() and strncmp() in (at least) V7 and 4.1BSD. To discover it, try the following: main() { if (strcmp("a\203", "a") <= 0) printf("Oops.\n"); } Note that the two strings are equal up to the point where one of them ends, therefore by definition of lexicographic ordering the longer one is greater. But strcmp() claims it's the lesser. This "works" only on machines where characters are signed. The problem is obvious when you inspect the code: strcmp's computation of a return code takes a shortcut that assumes that the end-of-string NUL collates low with respect to any other character. This is not true on a signed-char machine. To fix this, add the following before the routine: /* * CHARBITS should be defined only if the compiler lacks "unsigned char". * It should be a mask, e.g. 0377 for an 8-bit machine. */ #ifndef CHARBITS # define UNSCHAR(c) ((unsigned char)(c)) #else # define UNSCHAR(c) ((c)&CHARBITS) #endif Change the return at the end to: return(UNSCHAR(*s1) - UNSCHAR(*--s2)); And define CHARBITS for the compilation (say, -DCHARBITS=0377). Then make the same changes to strncmp(), which takes the same shortcut and has the same bug. Please don't try to tell me that the note in the BUGS section about using the native character comparison excuses this. The NUL is an end-marker, not a regular character of the string. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry