Path: utzoo!utgpu!watmath!rbutterworth From: rbutterworth@watmath.waterloo.edu (Ray Butterworth) Newsgroups: comp.std.c Subject: Re: How to use toupper() Message-ID: <23156@watmath.waterloo.edu> Date: 19 Jan 89 15:30:29 GMT References: <2537@xyzzy.UUCP> <189@becker.UUCP> <9256@smoke.BRL.MIL> <2792@xyzzy.UUCP> Organization: U of Waterloo, Ontario Lines: 87 getchar() presents yet another aspect of the problem. Consider: switch (getchar()) { case EOF: ... case 'C': ... } If 'C' is any character that sign extends, the switch won't work. > karl@haddock.ima.isc.com (Karl Heuer) > > msb@sq.com (Mark Brader) > > The best you can do is to avoid "char" altogether and use "unsigned char". > > You probably have to do it throughout the program, in fact. > If the program has to be strictly conforming, you may be right. (But then > string literals, and functions that expect `char *' arguments, may screw > things up; casting the pointers ought to be safe, though.) i.e. you will have to say (unsigned char *)"string" or (unsigned char)'C' whenever you use any literal, and you'll have to cast all your (char*) arguments to standard ANSI functions. This is true for any application that might be used in a locale with non-ASCII character sets and wants to be portable to any conforming ANSI compiler that might have chosen to treat chars as signed. In general though, if the compiler is expected to produce programs that can work on a local character set containing characters with the high bit set, it is almost certain that the compiler will have to treat (char) as (unsigned char). Anyone that really wants to use chars to perform signed arithmetic can now explicitly ask for (signed char). The Standard should have explicitly stated that (char) is identical with (unsigned char), and mentioned that compilers may, as an extension, treat chars as signed for backward compatibility. At least, this should have been listed as a denigrated feature that will probably be eliminated in future versions of the Standard. In practice I'm sure that is the way it will eventually turn out. I can't imagine any European ANSI compiler having (char) signed. It would provide far too little benefit and far too many complications. Much of this was mentioned to the Committee. e.g. Letter P04 to the Second Public Review contained: + 4.3 Character Handling: + Most of these functions don't work for signed char values if + the upper bit is on. Is it unreasonable to expect that with + char c[10]; + int i; + c[0] = i = getchar(); + the function calls + isxxx(*c) + and + isxxx(i) + should behave the same way if "i" is not EOF? This is not difficult + to do, and there certainly can't be any existing code that depends on + the described behavior. Why not state that if the argument is not + EOF, the result will be the same as if the argument were cast to + unsigned char. This would also remove the need for an equivalent to + the "isascii" function. Perhaps I overestimated their abilities when I said "is not difficult". Their response was: + This was considered a request for information, not an issue. Well, it certainly looks like an issue to me. + It was never intended that they do so. If you pass a signed char + argument and the sign is extended, the resulting value will not fit + in an unsigned char, as required. Exactly. I'm saying that you don't need to require it. Drop that requirement and say that they are only defined to work on values that can be returned by getchar(). + Your suggestion would require the functions to cast their + argument to unsigned char if it is non-EOF. No it wouldn't. + This would require macro versions to evaluate their argument more + than once (once to test for EOF and once to cast them), rendering + them unsafe. No, it would not require that macros evaluate their argument more than once. At worst it would require defining EOF as some negative value other than -1, something that is explicitly allowed by the Standard.