Xref: utzoo comp.lang.c:15401 comp.std.c:664 Path: utzoo!utgpu!watmath!clyde!ima!haddock!karl From: karl@haddock.ima.isc.com (Karl Heuer) Newsgroups: comp.lang.c,comp.std.c Subject: Re: How to use toupper() Message-ID: <11391@haddock.ima.isc.com> Date: 11 Jan 89 21:32:35 GMT References: <2537@xyzzy.UUCP> <189@becker.UUCP> <9256@smoke.BRL.MIL> <2581@ficc.uu.net> <1989Jan6.231955.7445@sq.uucp> Reply-To: karl@haddock.ima.isc.com (Karl Heuer) Followup-To: comp.std.c Organization: Interactive Systems, Boston Lines: 37 This has mostly reduced to an ANSI-C-specific issue, so I'm redirecting followups to comp.std.c. In article <1989Jan6.231955.7445@sq.uucp> msb@sq.com (Mark Brader) writes: >So for now, the best compromise seems to be: >#ifdef __STDC__ /* [corrected --kwzh] */ > if (*p >= 0) *p = toupper(*p); /* Version 2 */ >#else > if (isascii(*p) && islower(*p)) *p = toupper(*p); /* Version 5 */ >#endif As Mark already pointed out, version 2 can break in an international environment. My recommendation (in a parallel article) was *p = toupper((unsigned char)*p); /* Version 6 */ which has the subtle flaw that, if plain chars are signed and the result of toupper() doesn't fit, ANSI C does not guarantee the integrity of the value (the conversion is implementation-defined). Mark further points out in e-mail: >The trouble is that while Version 2 can break for some characters in the >international environment, Version 6 can break for ALL characters in a >vanilla environment ("C" locale)! Well, not *all* characters; just those that appear negative (and hence don't fit when converted back from unsigned char). And this set is guaranteed to exclude the minimal execution character set. But the code as written could still produce surprises on a sufficiently weird implementation which is still within the letter of the Standard. >The best you can do is to avoid "char" altogether and use "unsigned char". >You probably have to do it throughout the program, in fact. If the program has to be strictly conforming, you may be right. (But then string literals, and functions that expect `char *' arguments, may screw things up; casting the pointers ought to be safe, though.) Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint