Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!caip!rutgers!nike!ucbcad!zen!cory.Berkeley.EDU!chapman From: chapman@cory.Berkeley.EDU (Brent Chapman) Newsgroups: net.lang.c,net.micro.pc Subject: Re: Signed char - What Foolishness Is This! Message-ID: <643@zen.BERKELEY.EDU> Date: Sat, 18-Oct-86 19:31:02 EDT Article-I.D.: zen.643 Posted: Sat Oct 18 19:31:02 1986 Date-Received: Tue, 21-Oct-86 21:02:07 EDT References: <8719@duke.duke.UUCP> Sender: news@zen.BERKELEY.EDU Reply-To: chapman@cory.Berkeley.EDU.UUCP (Brent Chapman) Organization: UNIXversity of California at Berkeley Lines: 75 Xref: watmath net.lang.c:10762 net.micro.pc:10553 In article <8719@duke.duke.UUCP> jwg@duke.UUCP (Jeffrey William Gillette) writes: >MSC 4.0 defaults 'char' to 'signed char'. [ it defaulted to 'unsigned char' in previous versions of MSC -- Brent] [ details relating to a gotcha in header files, because Microsoft didn't cast a (possibly) negative char value into an unsigned value when using it to index an array, deleted ] >What possible justification is there for this default? Is not >'char' primarily a logical (as opposed to mathematical) quantity? What >I mean is, what is the definition of a negative 'a'? I can understand >the desirability of allowing 'signed char' for gonzo programmers who >won't use 'short', or who want to risk future compatibility of their >code on the bet that useful characters will always remain 7-bit entities. This brings up some interesting questions and ambiguities concerning K&R's definition of C. I haven't seen the proposed ANSI standard, so I can't comment on it. But K&R will do to illustrate the ambiguities; perhaps someone else can point out if and how the proposed standard deals with them up. On page 34, K&R define a 'char' to be "a single byte, capable of holding one character in the local character set." On page 40, they say "The language does not specify whether variables of type char are signed or unsigned quantities." This seems to imply that the implementor is free to choose the default that he feels best suits his implementation. On most machines, this is a moot point, since most machines only use the 0 to 127 range for character values, which is available regardless of whether the char is signed or unsigned. On the PC, however, it _does_ make a difference, because the upper 128 characters of the PC's character set _are_ printable, and are numbered from 128 through 255. Logic would seem to indicate the 'unsigned char' is the reasonable choice for the default on a C compiler for the PC. Unfortunately, most other C implementations, especially UNIX C implemetations, seem to default char to 'signed'. (Note that I've been assured of this by knowledgeable sources, but don't have any first hand knowledge, so I could be wrong.) This is a reasonable choice because, in the original K&R C definition, there is no 'signed' keyword. Therefore, everything should default 'signed' because if it defaults 'unsigned', there's no way to change it to 'signed'. Many implementations now include the 'signed' keyword, however. I don't know if it is a part of the proposed ANSI standard, but I think that it probably is. Now, Microsoft apparently decided to change their default for chars from 'unsigned', which is what it was in versions of the compiler previous to Ver 4.0, and which makes sense for a PC, to 'signed', which makes sense because of K&R's lack of a 'signed' keyword, and because most other implementations are that way. The original poster got bitten because Microsoft used a 'char' (which could be negative) as an array index, instead of casting it to 'unsigned char', in one of their library header files. Perhaps the most general, portable solution is not to use char variables for counting or array indexing. If you need a counter, use a short, which will default signed unless you say otherwise. If you need an array index, cast to an 'unsigned char' or an 'unsigned short'. Unfortunately, there is no guarantee that a short is as small as a char, so you may be wasting some space. Worse, there is no guarantee that a short is as _long_ as a char, although I doubt there is any implemetation where this is true. You currently can't count on whether a char will be signed or unsigned. Does the proposed ANSI standard address this? Fortunately, with MSC Ver 4.0, you can have your cake and eat it too. There is a command-line option to the compiler that will change the default from 'signed' to 'unsigned'. I think it's '-J', but I'm not certain, since I'm at home and my manuals are at work. Brent -- Brent Chapman chapman@cory.berkeley.edu or ucbvax!cory!chapman