Path: utzoo!utgpu!water!watmath!clyde!rutgers!gatech!bloom-beacon!husc6!mailrus!umix!uunet!ccicpg!felix!dhw68k!david From: david@dhw68k.cts.com (David H. Wolfskill) Newsgroups: comp.lang.c Subject: Character handling functions -- Jan 88, dpANS Keywords: isalpha islower isupper locale confusion Message-ID: <6192@dhw68k.cts.com> Date: 25 Mar 88 21:34:10 GMT Reply-To: david@dhw68k.cts.com (David H. Wolfskill) Organization: Wolfskill residence; Anaheim, CA (USA) Lines: 75 In reading a copy of the 11 January, 1988 dpANS C Standard (X3J11/88-001), I ran across something with respect to the character handling routines in the library that I suspect that I do not understand adequately. I realize that an attempt is made (in the draft standard) to accomodate alphabets other than the English one, and that the use of such an alphabet is not the default (but is specified by selecting a non-default "locale"; the default locale is the "C" locale). In section 4.3.1.2, the description of the "isalpha" function reads: The isalpha function tests for any character for which isupper or islower is true, or any of an implementation-defined set of characters for which none of iscntrl, isdigit, ispunct, or isspace is true. In the "C" locale, isalpha returns true only for the characters for which isupper or islower is true. In section 4.3.1.6, the description of the "islower" function reads: The islower function tests for any lower-case letter or any of an implementation-defined set of characters for which none of iscntrl, isdigit, ispunct, or isspace is true. In the "C" locale, islower returns true only for the characters defined as lower-case letters (as defined in [section]2.2.1). In section 4.3.1.10, the description of the "isupper" function reads: The isupper function tests for any upper-case letter or any of an implementation-defined set of characters for which none of iscntrl, isdigit, ispunct, or isspace is true. In the "C" locale, isupper returns true only for the characters defined as upper-case letters (as defined in [section]2.2.1). For the "C" locale, I see no problem whatsoever. Since this is probably the only locale I am likely to use, the issue I am bringing up does not directly affect me; nevertheless, I would like to determine whether or not my present understanding is shared by others. I perceive 2 concerns: 1) It would seem to be possible for a character -- interpreted in a locale other than the "C" locale -- to cause isalpha to return true, yet cause both isupper and islower to fail to return true. Is this both expected and reasonable? 2) Similarly, it would seem to be possible for a character to be able to cause isalpha to fail to return true, and yet cause either (or both!) of isupper and islower to return true. Likewise, is this both expected and reasonable? Here is a (partial) list of approaches (assuming that the cited wording needs to be fixed): 1) Include "islower" in the "stop list" for "isupper", and vice versa. 2) Specify that a character that causes isalpha to return true must cause precisely one of islower or isupper to return true. 3) Specify that a character that causes either islower or isupper to return true must also cause isalpha to return true. Another approach, of course, would be to explicitly state (perhaps in the Rationale) that the above-described behavior really is desired. (Perhaps it's just my provincialism, but this really does seem a bit unlikely to me.) I look forward to seeing your comments to the above, david -- David H. Wolfskill uucp: ...{trwrb,hplabs}!felix!dhw68k!david InterNet: david@dhw68k.cts.com