Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!bloom-beacon!athena.mit.edu!scs From: scs@athena.mit.edu (Steve Summit) Newsgroups: comp.lang.c Subject: Re: getch() and getche() in MSC 4.0 Message-ID: <7594@bloom-beacon.MIT.EDU> Date: 21 Oct 88 06:54:53 GMT References: <10508@dartvax.Dartmouth.EDU> Sender: daemon@bloom-beacon.MIT.EDU Reply-To: scs@adam.pika.mit.edu (Steve Summit) Lines: 153 This is a snide, whiney "I told you so" to the efficiency addicts and macro panderers out there. In article <10508@dartvax.Dartmouth.EDU> Scott Horne writes: >Has anyone else had trouble with getch() and getche() in Microsoft C v. 4.0? >They often skip every other keypress on me--and in one case, they skip two >keypresses out of three! Maybe it's my code. This occurs mainly when I try > > c = toupper(getch()); (getch and getche are fairly pointless and superfluous low-level analogues to getchar, but this is irrelevant.) In the old days, the toupper macro worked correctly only on lowercase alphabetic characters, which meant that one often ended up writing if(islower(c)) c = toupper(c) The hackers at the Shady Hill home for arthritic-fingered programmers got tired of typing this, so a variant appeared: toupper could be made to work correctly (a laudable goal) with an implementation such as: #define _toupper(c) ((c) - ('a' - 'A')) #define toupper(c) (islower(c) ? _toupper(c) : (c)) Now, there are three conventions for writing macros: 1. Parenthesize fully, inside and out 2. Use capital letters in the name, to remind the reader it's a macro and may therefore act weird 3. Make every effort not to repeat "arguments," so that side effects aren't replicated A "side effect" is anything that an expression does other than "return" a value, and is therefore a problem if something like toupper(*p++) is (textually, before the code generator gets to it) expanded to islower(*p++) ? _toupper(*p++) : *p++ How many times is p incremented? Besides pre- and postincrenment and -decrement, the other classic example of a side effect is I/O. What a coincidence: look at what Scott Horne used as an argument to toupper, and note the curious concordance between the period of its failure mode (two out of three) and the number of times toupper's argument is repeated in its expansion. Rule 2 is occasionally broken by "standard library" facilities, but generally only when rule 3 is observed, so that the distinction between function and macro is transparent to the caller. The "improved" toupper macro, scrupulous as it is in its adherence to rule 1, violates both rules 2 and 3, and is therefore a perfect ticking time bomb long term booby trap of a recurring nightmare for unsuspecting programmers everywhere. If it is desirable for toupper to work correctly on characters that are nonalphabetic or already upper-case (I believe this property is called "idempotence," and as I said, it is a laudable goal), then the macro implementation has to be sacrificed, and toupper() made a proper function. By the way, the fancy toupper macro also violates a fourth rule, almost universally ignored today, which is that macros shouldn't expand to "too much" code, because in the old days we only had 64K or so to play with, and every byte counted. The most famous exception is the recent Berkeley line-buffered putc macro, which is something like seven backslash-continued lines long, although, believe it or not, it does manage to guarantee a single evaluation of its first argument, so putc(*p++, fd) will work, as indeed it must. One would try something ludicrous like FILE *fdarray[10]; ... putc(c, fd[i++]) at one's extreme peril, however. Now, with respect to Microsoft, their run-time library gets tugged in several directions as they try to maintain compatibility with existing code while migrating toward ANSI, and in version 4 I believe they had two separate versions of toupper, depending on which header file you #included. To make things even more confusing, I think one header file gave you the unsafe macro I'm disparaging, and the other got you a real function. (Of course, there was also a third implementation, called "_toupper", which is the non-checking version, safely implementable as a macro, such as appears in the example towards the beginning of this article.) (These difficulties may be resolved in Microsoft's Version 5. Although I happen to use Microsoft V5, I don't pay much attention to its or anyone's implementation of islower/toupper any more. Any code of mine that cares protects itself with #ifdef _toupper #undef toupper #define toupper _toupper #endif which recreates, with only the barest twinges of worry about undermining _reserved ANSI identifiers, a cozy V7 environment. I'll call islower() explicitly; thank you. Note that I do this not for efficiency's sake but for safety; an even more likely side-effect-containing argument for ctype macros than getch() is *p++.) The bottom line is, don't implement things with macros unless it's absolutely safe. The potential efficiency improvements simply aren't worth it when they lead to these "little surprises." In those rare cases where the efficiency gain is significant and important, capitalize the hell out of the macro name and plaster the code and documentation with big warnings, and budget some time for the confusion and stubborn bugs which will still inevitably arise. Speaking of documentation; some will haughtily tell the original complainant to RTFM; Microsoft's manual may well state that toupper is a macro and can't be used on arguments with side effects. That's unacceptable. Someone coined a nice phrase called the "principle of least surprise." Among other things, it holds that there is a class of mistakes which are so easy to make that no amount of documentation will rescue them; the only solution is to remove the problem, in this case the dangerous macro implementation. Let's not get started on tweaks to the preprocessor to make dangerous macros safer to write; we just spent a month or so exhaustively treating how not to square numbers. If you want to work on something, work on good inlining algorithms instead. And before you think that your proposed improvements to the preprocessor make whacko macros safe, or even that the three or four rules listed above are sufficient, consider putc(c, fd); which is what people like me write when we've indented ourselves into a brick wall at the right margin but are for some stupid reason reluctant to break out into another subroutine. Although ANSI says macro invocations are allowed to cross newline boundaries, there are a lot of existing preprocessors which can't handle them without explicit backslash continuations. (I can't say I blame them, macro invocations spanning newlines being rather extremely painful to implement correctly.) Steve Summit scs@adam.pika.mit.edu