Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!snorkelwacker.mit.edu!bloom-picayune.mit.edu!news From: scs@adam.mit.edu (Steve Summit) Newsgroups: comp.lang.c Subject: Circumspect programming (was: Evaluation of if's) Message-ID: <1991Jun20.012131.26756@athena.mit.edu> Date: 20 Jun 91 01:21:31 GMT Sender: news@athena.mit.edu (News system) Reply-To: scs@adam.mit.edu Organization: Thermal Technologies, Cambridge, MA Lines: 195 This protracted debate has illustrated two subtly but significantly different ways of thinking about expressions such as (i = 1) == (i = 2) One school of thought says "the expression contains two assignments to the same object, therefore it's undefined. Period; end of report." The second school says "Yes, we understand that you can't tell whether (i = 1) or (i = 2) happens first, but it's still the case that it boils down to (1 == 2) which is always false, right?" The first school says "No, it's not an order of evaluation problem; the fact that there are two assignments renders the whole expression undefined, and anything can happen." The second school says "Yes, we understand that you can't tell what value i will end up with, but the value of each assignment is unambiguously its right-hand side, right?" And so it goes. The first school keeps saying "it's undefined!", assuming that that fully answers the question, and it can't understand why the second school keeps asking more questions. (Before I go any further, let me point out that I am not trying to cast any stones here. The first school, though correct, has been somewhat knee-jerk in its responses, myself included. The second school is displaying what ought to be a healthy curiosity about "what's really going on.") I have been leaning toward the first school ever since I was first learning C, when I read, in K&R, this line I keep quoting: The moral of this discussion is that writing code which depends on order of evaluation is bad programming practice in any language. Naturally, it is important to know what things to avoid, but if you don't know how they are done on various machines, that innocence may help to protect you. Now, I'll admit that I read into this statement a bit more than it explicitly says. Whenever I see *any* "fishy" expression, whether it's a[i] = i++ or printf("%c %c\n", getchar(), getchar()) or printf("%d\n", i++ * i++) or (i = 1) == (i = 2) , or anything else with potential multiple side effect or evaluation order ambiguities, a little alarm goes off that says "stay away!" That's all it takes. I don't start thinking about what the compiler might reasonably (or unreasonably) do, or looking at the assembly output, or reading through documentation trying to discover if some subpart of the expression might have a defined value. (I don't try to discover "how they are done on various machines.") I call this good, safe programming. I used the word "circumspect" in the Subject: line, but it could also be labeled "conservative." Someone will likely label it (pejoratively) as "paranoid," as if one shouldn't have to worry about such things, or as if one ought to be able to take advantage of unspecified or undefined nuances if the code in question doesn't have to be portable, or as if casting anything that even hints at undefinedness out of one's programming vocabulary would be unacceptably restrictive to one's creativity. I have found none of these restrictions stifling; in fact they are quite liberating, in that I almost never have to track down stupid, subtle bugs, or move mountains to port code. In an earlier article on this topic, I mentioned that "The comp.lang.c frequently-asked questions list has a bit to say about undefined order of evaluation." A number of people have taken me to task for this, saying that the FAQ list answer doesn't cover (i = 1) == (i = 2) at all. Now, I didn't claim that it answered the current question (in fact, it mentions "order of evaluation" which we've agreed this problem isn't), but I will admit that, to me, the FAQ list answer does cover both cases, in that the same alarm bell -- evoked by the same "innocence may serve to protect you" quote -- goes off either way. I hope this article doesn't sound too pompous, or holier-than- thou, or us vs. them. There are obviously quite a few people in what I have called the "second school," and it would be quite insensitive of me to just say that they should all think the way I do. (However, I do have to admit that wondering if there can be meaning in (i = 1) == (i = 2) , even though it's explicitly undefined, seems rather like wondering if one can be a little bit pregnant.) Now, it may be that some of the people who are keeping this thread alive aren't really worried about the (undefined) expression (i = 1) == (i = 2) at all, but are rather simply wondering whether the value of the expression i = 1 is "one" or "the value of i." (There have even been suggestions made that the answer is somehow different for ANSI C than "Classic" C, and that the ANSI Standard answer therefore isn't relevant for pre-ANSI compilers.) This starts looking like a hard question to answer, because you can't find words in the Standard (or in any number of C reference books) which explicitly answer it. The answer isn't written down explicitly because it's so simple: *it doesn't matter*. It is defined that the value of an assignment statement is the value of the right-hand side, cast to the type of the left-hand side. In a correct program (one which doesn't have multiple assignments, within the same expression, to the same object, in particular to the one on the left-hand side) there is absolutely no detectable difference between "the value of the right-hand side, cast to the type of the left-hand side" and "the value (after the assignment) of the left-hand side," because "the value of the right-hand side, cast to the type of the left-hand side" is precisely what gets assigned to the left-hand side. A compiler writer therefore has complete freedom to arrange to emit code which either re-fetches the left-hand side, or uses the coerced value of the right-hand side. As long as there can't be other intervening assignments to the left-hand side, it can't matter which choice is made. This is an excellent example of how an explicitly undefined area of the language (i.e. that it's undefined what happens if you modify the same object twice within one expression) allows the compiler writer a useful freedom, so that compiler writers are then likely to make use of that freedom, and write different compilers that implement the undefined areas in different ways, so that programmers are strongly advised to leave the undefined areas well alone, lest they break their side of the contract (i.e. the standard) and yank the rug out from under the compiler writer (and, more significantly, themselves) by instigating a case the compiler writer was allowed to assume "couldn't happen." This explains why the "first school" keeps harping on the "no multiple side effects to the same object" rule, which is really the relevant issue. If there aren't multiple side effects to the same object, assignment semantics aren't confusing (or worth talking about); and if there are, the expression is undefined, so it's really not worth talking about. (Note, too, that the situation is not any more undefined under the ANSI rules than it was before: compilers have always been free to -- and I am aware of pre-ANSI compilers which do -- implement (i = 1) == (i = 2) in the "surprising" or "wrong" way.) The final case, which has been raised by a few alert correspondents, concerns the value of i = 1 when i is volatile. The volatile qualifier is new with ANSI C (and C++), so there is not as much experience with it. As Chris (and perhaps others) have already pointed out, the semantics of volatile objects are themselves not very fully defined by the Standard, but are left to the implementation, so we can't answer this last question definitively. The value of i = 1 when i is volatile might be guaranteed to be one, or it might be guaranteed to be the fetched value of i (which is not necessarily one, even in the absence of intervening asynchronous stores to i, if i is a register with special read/write semantics). Presumably, a conscientious vendor will think about this case, define a reasonable behavior, and document it well. Steve Summit scs@adam.mit.edu