Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!pacific.mps.ohio-state.edu!linac!att!ucbvax!dog.ee.lbl.gov!elf.ee.lbl.gov!torek
From: torek@elf.ee.lbl.gov (Chris Torek)
Newsgroups: comp.lang.c
Subject: Re: One more point regarding = and == (more flamage)
Message-ID: <11563@dog.ee.lbl.gov>
Date: 28 Mar 91 17:45:38 GMT
References: <925@isgtec.UUCP> <1991Mar26.184245.3538@chinet.chi.il.us>
Reply-To: torek@elf.ee.lbl.gov (Chris Torek)
Organization: Lawrence Berkeley Laboratory, Berkeley
Lines: 133
X-Local-Date: Thu, 28 Mar 91 09:45:38 PST

[`Hey Rocky, watch me pull just one more point out of my hat'
`That trick NEVER works!' `This time for sure']

>>>  a) while (*foo++ = *bar++)
>>>  b) while (*foo ++ == *bar++)
>>>  c) while ((*foo++ = *bar++) != 0)

>In article <925@isgtec.UUCP> robert@isgtec.UUCP writes:
>>Well the biggest argument has been if you use a) the maintainer can't tell
>>if you meant a) or b);  if you use c) the maintainer KNOWS you meant a).
>>This isn't rubbish.

As we all know by now, I happen to agree with this sentiment, but much
more so when applied to `if'; `while' errors of this sort are less
common.  The following `just one more point' explains why.

In article <1991Mar26.184245.3538@chinet.chi.il.us> les@chinet.chi.il.us
(Leslie Mikesell) writes:
>If you assume that the programmer didn't make a mistake (i.e. typed
>what he was thinking), then a) is just as obvious as c).  If you
>assume that he did make a mistake, then c) is probably more likely
>to be wrong that a).  More characters = more chances to screw up.

This would be true but for the fact that coding is done by *people*.

Human error rate is a `jittery' function.  Although a number of
studies have shown remarkable consistency in the error rate measured
as `number of errors found divided by number of source lines', it
is also the case that people use more care with `complicated'
constructs.  That is, people are more likely to leave an uncorrected
error behind when typing

	The quick brown fox jumps over the lazy dog

than when typing

	2.718281828459045235360

I spent more time checking the above expansion of `e' than I did typing
this entire sentence.

Note also that, in addition to the fact that error rate is not a
monotonic function of `number of characters typed', error studies
typically find different `kinds' of errors.  One important kind of
error is the `typo' (typographical error) (and this one really *is* a
function of the number and placement of characters typed).  Typographial
errors take three forms:

	transpositions	(`The quick bronw fox jumps over hte lazy dog')
	insertions	(`The quiick brown fox jumpsd over the lazy dog')
	deletions	(`The quick brown fox umps over the lazy dog')

Typographical errors are, if not the most common form of error, certainly
in the top contenders.

Keeping these in mind, let us consider C code.

After one becomes familiar with C, constructs like

	if ((c = getchar()) != EOF)

become `natural' and one does not think twice when writing them.  In
many languages (not just C, although C is rare in its partcular
spelling) constructs like

	if (a == b)

are also `natural' and again one does not think twice.  Now, most
errors can be caught before they happen, just by thinking twice.  So
if people found

	if (a == b)

unfamiliar, they would check again and possibly discover that they had,
by mistake, typed in

	if (a = b)

---but `if (a == b)' is too familiar to bother rechecking, and such
typos go unnoticed.

Thus, when I (as a software maintainer) find

	if (a = b)

I must consider this a `red flag' signifying a possible error, while

	if ((a = b) != 0)

is quite unlikely to be a typo.

On the other hand, while loops of the form

	while (*a++ == *b++)

are considerably more rare.  It is therefore more likely that whoever
wrote

	while (*a++ = *b++)

really intended the assignment.  Still, deletions are a common form
of typographical error; perhaps the single `=' is a mistake anyway.
If the assignment is intended,

	while ((*a++ = *b++) != 0)

is a clear flag that `there is no deletion typo here'.  If the latter
is what was meant but

	while ((*a++ == *b++) != 0)

actually appears, this acts as another flag: it is unusal for people
to use the result of a comparsion in anything but a `direct boolean'
context (if, while, &&, etc.).

In other words, it all comes down to these facts:

  * Embedded assign-and-test is common enough not to get rechecked.

  * Typographic errors of deletion and of doubling (`quiick') are
    very common.

Combining these leads to the two mistakes below:

	if (a = b)			/* oops */
		foo();
	while (n < lim)
		n == f(n);		/* oops */

both of which draw warnings from many compilers.
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov