Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site utcsri.UUCP
Path: utzoo!utcsri!greg
From: greg@utcsri.UUCP (Gregory Smith)
Newsgroups: comp.lang.c
Subject: Re: short circuit evaluation
Message-ID: <4315@utcsri.UUCP>
Date: Thu, 5-Mar-87 17:36:39 EST
Article-I.D.: utcsri.4315
Posted: Thu Mar  5 17:36:39 1987
Date-Received: Thu, 5-Mar-87 21:06:35 EST
References: <844@wanginst.EDU> <4211@utcsri.UUCP> <609@viper.UUCP>
Reply-To: greg@utcsri.UUCP (Gregory Smith)
Organization: CSRI, University of Toronto
Lines: 94
Summary: 

In article <609@viper.UUCP> john@viper.UUCP (John Stanley) writes:
>The example Greg provides here is misleading.  
>
>If you have three expressions:
>	1:  a+b+c
>	2: (a+b)+c
>	3:  a+(b+c)
>all three of them should not be evaluated the same way.  Greg implys that
>they should be.  This is not so.  When an equation contains '(' and ')'
>it intentionaly (and explicitly) defines the parse tree structure that will
>result.  The statement "redundant ()'s grow in C like mushrooms" may be true,
>but it doesn't give anyone the right to arbitrarily ignore explicit cues
>to the compiler.  When I don't care, I don't use them.  When I do, I do
>so for a reason..........
>
How do you tell an explicitly defined tree from an implicit one? a+b+c
parses to the same expression tree as (a+b)+c. That could be changed
(i.e. () info could be included in the expr. tree ). But what about
(a+b)*c? There is no way to represent that *expression tree* without
parentheses. How do I then write this in such a way that the compiler is
allowed to rearrange it? My point is that ()'s serve to override the
binding strengths of operators, and thus allow arbitrary expressions to
be constructed. They cannot be used to prevent optmizations,
since they are *required* in many cases, simply to write the expression
in the first place. I want my compiler to be free to change (i+3)*4+6
to i*4+18 (more on this later). If I don't want that, there are means
of avoiding it.

Furthermore, a redundant () may very well not be "intentional", in the
applicable sense of the word, and therefore should not be construed as
"explicit". more later.

>  Since there is additional, undesireable, and unnecessary overhead in the
>detection of this SUM(a1,a2,a3...,aN) special case, and since there appears
>to be little or no advantage to doing so (you have to add them up in -some-
>order, you might as well let the programmer decide as anyone), why bother?
>
Wrong. There is an advantage to doing the sum in an arbitrary order.

First of all, it rarely makes a difference to the result (if it
does, it is the programmer's fault, by definition :-) ).

It allows constants to be folded to the end (a+2)+(b+9)+4 = (a+b)+15.
It allows fewer registers to be used. ((a+b)+(c+d))+((w+x)+(y+z)),
if done literally, requires 3 working values at one point, while a
running sum never requires more than one. The programmer cannot
determine the best order, because the programmer is not writing
code specifically for a certain machine, right? :-) the compiler
is supposed to do that.

Multiple constants in running sums tend to pop up (1) from macro
expansions (2) in expressions like (p+3)->foo.y[i-1].abc ( after the
semantics have been applied ). In the latter case the programmer can't
control them because s/he can't really see them. One could of course
distinguish the programmer-created weirdness from the internally-created
ones, but why not optimize both of them? One cannot, of course,
distinguish macro-created weirdness under the current preprocessor
paradigm.

And once you convert arbitrary additions into rearrangable running sums,
it becomes very attractive to convert things like (i+7)*4 + 2 into
i*4 + 28 + 2 into i*4 + 30. Again, this comes up a *lot* with array
operations, and again, if it breaks a program, that program was probably
doomed anyway.

For more on this sort of thing, see "A Tour through the UN*X C Compiler"
in the Version 7 books.

PS if you don't believe me about the mushrooms, I give you the
expansion of getchar(putchar()):

(--((&_iob[1]))->_cnt>=0? ((int)(*((&_iob[1]))->_ptr++=
(unsigned)((--((&_iob[0]))->_cnt>=0? *((&_iob[0]))->_ptr++&0377:
_filbuf((&_iob[0])))))):_flsbuf((unsigned)((--((&_iob[0]))->_cnt>=0?
*((&_iob[0]))->_ptr++&0377:_filbuf((&_iob[0])))),(&_iob[1])));

Most of those '()'s are in there to enforce precedence against
arbitrary parameters. E.g. if I write

#define INCH_TO_CM(x) x*2.54

then INCH_TO_CM(a+b) becomes a+b*2.54, which is wrong.
To be safe, I have to write

#define INCH_TO_CM(x) ((x)*2.54)

>John Stanley (john@viper.UUCP)
>Software Consultant - DynaSoft Systems
>UUCP: ...{amdahl,ihnp4,rutgers}!{meccts,dayton}!viper!john

-- 
----------------------------------------------------------------------
Greg Smith     University of Toronto      UUCP: ..utzoo!utcsri!greg
Have vAX, will hack...