Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!ima!haddock!karl From: karl@haddock.UUCP Newsgroups: comp.lang.c Subject: Re: is it really necessary for character values to be positive? Message-ID: <306@haddock.UUCP> Date: Sun, 18-Jan-87 16:49:16 EST Article-I.D.: haddock.306 Posted: Sun Jan 18 16:49:16 1987 Date-Received: Mon, 19-Jan-87 02:03:39 EST References: <39@houligan.UUCP> <289@haddock.UUCP> <598@mcgill-vision.UUCP> Reply-To: karl@haddock.ISC.COM.UUCP (Karl Heuer) Organization: Interactive Systems, Boston Lines: 76 Summary: It's worse than I thought In article <598@mcgill-vision.UUCP> mcgill-vision!mouse (der Mouse) writes: >In article <289@haddock.UUCP>, karl@haddock.UUCP (Karl Heuer) writes: >> Suppose I am using such a system, and one of the characters -- call >> it '@' -- has a negative value. The following program will not work: >> main() { int c; ... c = getchar(); ... if (c == '@') ... } >> ... Any printing character that I want to enclose in single quotes had >> better be positive, or it becomes VERY awkward to use. > >Well. Now, exactly what does it mean to say that @ is negative? >Presumably it means that the test below will succeed: > char c = '@'; if (c < 0) ... Actually what I meant was simply that "if ('@' < 0) ..." would succeed. This is not the same thing since '@' has type int. Your test says only that char is implemented as a signed datatype, and that '@' has the high bit set. >Notice that you can't make '@' the same thing as what getchar() returns, >because [char s[N]; if (s[0] == '@') ...] will fail. That's the flip side of the problem, which I overlooked it in my posting. The problem is independent of single-quotes; any machine on which characters are signed will fail to handle the test (getchar() == s[0]). The only reason it "worked" so well on the pdp11 was that *in practice*, all the chars one has to deal with (I'm assuming text characters, not one-byte integers) were 7-bit, so it didn't matter whether they were sign-extended (as with s[0]) or unsigned (as with getchar()). >About the neatest solution I see is to make 'x' have type unsigned char >rather than int, at least when there's only one character between >quotes. Then we also have to arrange that char and unsigned char >are not promoted to int in expressions not involving anything bigger >than char. This should make both of these work. I dunno. A simpler solution is to assert that plain char is unsigned char. As I said before, I suspect the adopted solution will be that in an 8-bit environment plain char will be unsigned char; the only default-signed-char compilers will be on pdp11-like machines in 7-bit environments. >(is there any code out there *using* multi-char character constants?) If so, it's almost all nonportable. The only portable use I've seen was one I wrote for a program that dealt with the two-letter codes found in termcap, troff, etc: "switch (s[0]*'\1\0' + s[1]*'\0\1') { case 'xy': ...; }". I ended up not using it anyway, since lint didn't like it. (But it is independent of byte size and byte ordering.) [From article <600@mcgill-vision.UUCP>, same author, again quoting kwzh] >> [Your suggestion] supports my contention that making getchar() an int >> function was a mistake in the first place.** I am now even more sure, btw, that making it (int)(unsigned char)c was wrong. (Perhaps, as someone else suggested, (int)c would have been better; provided EOF is defined as something out-of-band like 0x8000.) >> **I do have what I think is a better idea, but I'm not going to >> describe it in this posting. (This was because I tend to do a lot of my posting in the wee hours of the morning, and I didn't trust myself to give any details.) >How about in another posting then? Stay tuned. I'll probably be posting it to comp.lang.misc (since "it isn't C anymore") sometime in February (not sooner; I have a big project due). Look for "Error handling". >What I normally do is something more like [char c; /*!*/ ... c = getchar(); >if (feof(stdin)) ...] ie, *ignore* the EOF return and check explicitly. I think that's a better model in that it doesn't rely on the ability to cast char into a larger type; the problem is that it's cumbersome. The common idiom "while ((c = getchar()) != EOF) ..." has to be written with a comma ("while (c = getchar(), !feof(stdin)) ...") or a test-in-the-middle loop ("for (;;) { c = getchar(); if (feof(stdin)) break; ... }"). Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint