Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!snorkelwacker.mit.edu!bloom-picayune.mit.edu!news
From: scs@adam.mit.edu (Steve Summit)
Newsgroups: comp.lang.c
Subject: Circumspect programming (was: Evaluation of if's)
Message-ID: <1991Jun20.012131.26756@athena.mit.edu>
Date: 20 Jun 91 01:21:31 GMT
Sender: news@athena.mit.edu (News system)
Reply-To: scs@adam.mit.edu
Organization: Thermal Technologies, Cambridge, MA
Lines: 195

This protracted debate has illustrated two subtly but
significantly different ways of thinking about expressions such
as

	(i = 1) == (i = 2)

One school of thought says "the expression contains two
assignments to the same object, therefore it's undefined.
Period; end of report."

The second school says "Yes, we understand that you can't tell
whether (i = 1) or (i = 2) happens first, but it's still the case
that it boils down to (1 == 2) which is always false, right?"

The first school says "No, it's not an order of evaluation
problem; the fact that there are two assignments renders the
whole expression undefined, and anything can happen."

The second school says "Yes, we understand that you can't tell
what value i will end up with, but the value of each assignment
is unambiguously its right-hand side, right?"

And so it goes.  The first school keeps saying "it's undefined!",
assuming that that fully answers the question, and it can't
understand why the second school keeps asking more questions.

(Before I go any further, let me point out that I am not trying
to cast any stones here.  The first school, though correct, has
been somewhat knee-jerk in its responses, myself included.  The
second school is displaying what ought to be a healthy curiosity
about "what's really going on.")

I have been leaning toward the first school ever since I was
first learning C, when I read, in K&R, this line I keep quoting:

	The moral of this discussion is that writing code which
	depends on order of evaluation is bad programming
	practice in any language.  Naturally, it is important to
	know what things to avoid, but if you don't know how they
	are done on various machines, that innocence may help to
	protect you.

Now, I'll admit that I read into this statement a bit more than
it explicitly says.  Whenever I see *any* "fishy" expression,
whether it's

	a[i] = i++

or

	printf("%c %c\n", getchar(), getchar())

or

	printf("%d\n", i++ * i++)

or

	(i = 1) == (i = 2)

, or anything else with potential multiple side effect or
evaluation order ambiguities, a little alarm goes off that says
"stay away!"  That's all it takes.  I don't start thinking about
what the compiler might reasonably (or unreasonably) do, or
looking at the assembly output, or reading through documentation
trying to discover if some subpart of the expression might have a
defined value.  (I don't try to discover "how they are done on
various machines.")

I call this good, safe programming.  I used the word "circumspect"
in the Subject: line, but it could also be labeled "conservative."
Someone will likely label it (pejoratively) as "paranoid," as if
one shouldn't have to worry about such things, or as if one ought
to be able to take advantage of unspecified or undefined nuances
if the code in question doesn't have to be portable, or as if
casting anything that even hints at undefinedness out of one's
programming vocabulary would be unacceptably restrictive to one's
creativity.  I have found none of these restrictions stifling; in
fact they are quite liberating, in that I almost never have to
track down stupid, subtle bugs, or move mountains to port code.

In an earlier article on this topic, I mentioned that "The
comp.lang.c frequently-asked questions list has a bit to say
about undefined order of evaluation."  A number of people have
taken me to task for this, saying that the FAQ list answer
doesn't cover

	(i = 1) == (i = 2)

at all.  Now, I didn't claim that it answered the current
question (in fact, it mentions "order of evaluation" which we've
agreed this problem isn't), but I will admit that, to me, the FAQ
list answer does cover both cases, in that the same alarm bell --
evoked by the same "innocence may serve to protect you" quote --
goes off either way.

I hope this article doesn't sound too pompous, or holier-than-
thou, or us vs. them.  There are obviously quite a few people in
what I have called the "second school," and it would be quite
insensitive of me to just say that they should all think the way
I do.  (However, I do have to admit that wondering if there can
be meaning in

	(i = 1) == (i = 2)

, even though it's explicitly undefined, seems rather like
wondering if one can be a little bit pregnant.)

Now, it may be that some of the people who are keeping this
thread alive aren't really worried about the (undefined)
expression

	(i = 1) == (i = 2)

at all, but are rather simply wondering whether the value of the
expression

	i = 1

is "one" or "the value of i."  (There have even been suggestions
made that the answer is somehow different for ANSI C than
"Classic" C, and that the ANSI Standard answer therefore isn't
relevant for pre-ANSI compilers.)

This starts looking like a hard question to answer, because you
can't find words in the Standard (or in any number of C reference
books) which explicitly answer it.  The answer isn't written down
explicitly because it's so simple: *it doesn't matter*.  It is
defined that the value of an assignment statement is the value of
the right-hand side, cast to the type of the left-hand side.  In
a correct program (one which doesn't have multiple assignments,
within the same expression, to the same object, in particular to
the one on the left-hand side) there is absolutely no detectable
difference between "the value of the right-hand side, cast to the
type of the left-hand side" and "the value (after the assignment)
of the left-hand side," because "the value of the right-hand
side, cast to the type of the left-hand side" is precisely what
gets assigned to the left-hand side.

A compiler writer therefore has complete freedom to arrange to
emit code which either re-fetches the left-hand side, or uses the
coerced value of the right-hand side.  As long as there can't be
other intervening assignments to the left-hand side, it can't
matter which choice is made.  This is an excellent example of how
an explicitly undefined area of the language (i.e. that it's
undefined what happens if you modify the same object twice within
one expression) allows the compiler writer a useful freedom, so
that compiler writers are then likely to make use of that
freedom, and write different compilers that implement the
undefined areas in different ways, so that programmers are
strongly advised to leave the undefined areas well alone, lest
they break their side of the contract (i.e. the standard) and
yank the rug out from under the compiler writer (and, more
significantly, themselves) by instigating a case the compiler
writer was allowed to assume "couldn't happen."

This explains why the "first school" keeps harping on the "no
multiple side effects to the same object" rule, which is really
the relevant issue.  If there aren't multiple side effects to the
same object, assignment semantics aren't confusing (or worth
talking about); and if there are, the expression is undefined, so
it's really not worth talking about.

(Note, too, that the situation is not any more undefined under
the ANSI rules than it was before: compilers have always been
free to -- and I am aware of pre-ANSI compilers which do --
implement

	(i = 1) == (i = 2)

in the "surprising" or "wrong" way.)

The final case, which has been raised by a few alert
correspondents, concerns the value of

	i = 1

when i is volatile.  The volatile qualifier is new with ANSI C
(and C++), so there is not as much experience with it.  As Chris
(and perhaps others) have already pointed out, the semantics of
volatile objects are themselves not very fully defined by the
Standard, but are left to the implementation, so we can't answer
this last question definitively.  The value of

	i = 1

when i is volatile might be guaranteed to be one, or it might be
guaranteed to be the fetched value of i (which is not
necessarily one, even in the absence of intervening asynchronous
stores to i, if i is a register with special read/write
semantics).  Presumably, a conscientious vendor will think about
this case, define a reasonable behavior, and document it well.

                                            Steve Summit
                                            scs@adam.mit.edu