Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cs.utexas.edu!tut.cis.ohio-state.edu!purdue!haven!mimsy!chris From: chris@mimsy.umd.edu (Chris Torek) Newsgroups: comp.graphics Subject: Re: Integral Square Root Message-ID: <21560@mimsy.umd.edu> Date: 31 Dec 89 19:55:36 GMT References: <9170@cbmvax.commodore.com> <21550@mimsy.umd.edu> <9175@cbmvax.commodore.com> Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742 Lines: 118 >In article <21550@mimsy.umd.edu> I wrote: >> if ( (rb = r+(b=1<>uses and sets rb in an unknown order. ... a compiler is free to evaluate >>the second `rb', then compute r+..., assign that to rb, multiply that by >>the old value of rb, and compare with n. In article <9175@cbmvax.commodore.com> mitchell@cbmvax.commodore.com (Fred Mitchell - PA) writes: >... Let's analyze this line more closely (exploded IF, here we come!)... > > if ( > ( rb = r + > (b=1< * rb > <= n) > >Now the one stipulation of the C standard is that INNER PARENTHESIS >GETS EVALUATED FIRST!!!! As whoever-it-was said, `That turns out not to be the case.' The forthcoming C standard, which is not yet a standard, says that parentheses must be observed when evaluating expressions where rearrangement might affect the result. For instance, the floating point expression (a - 1.0) * b must be computed that way, and not as `a * b - b'---an expression that is mathematically equivalent, but not equivalent in the strange world of floating point (finite fields anyone?). Compilers remain free to rearrange expressions in which the rearrangement is not observable (except via examination of the object code). The C standard specifically does *not* say that parentheses control the order of the evaluation of the variables contained in the expression. Their affects are limited to suppressing rearrangement. That is, the compiler can still evaluate (a - 1.0) * b as `fetch b, fetch a, subtract 1.0, multiply'. A compiler does not have to `fetch a, subtract 1.0, fetch b, multiply'. Now: >So the first thing that MUST be evaluated >first is: > (b = 1 << i) This need not be evaluated first. This expression tells the compiler that it is to compute `1 << i', and eventually (by the next `sequence point') store the result in the location determined by evaluating the name `b' in object context. The result of this expression is to be the value that would be obtained if b were examined after `1 << i' was stored in it. (The implication of the latter is that if `b' is a small storage type, such as `char', and has a large value put in it, such as 32768, and the machine does not trap on overflow, the result will be a small value, and perhaps a negative one. In this case b has the longest type available in C, so there is no need to worry about truncation.) >then: > (rb = r + b) [written as (rb = r + (b = ...))] This expression tells the compiler that it is to compute `r + ' and eventually (again, by the next sequence point) store the result in the variable rb. Again, the result is the value rb would have after the store. >and finally: > rb * rb <= n [written as `(rb = ...) * rb'] This expression tells the compiler that it is to compute ` * rb', and compare that with the value stored in the object named `n'. There are no sequence points in any of the above expressions. This affects only one thing, and that is the value produced by the second name `rb', in `(rb = ...) * rb'. The compiler is free to obtain that value either before assigning the new value (r + (b = ...)), or after assigning the new value. >Don't fault Chris however. It is very terse and a bit hard to understand, >especially for neophyte C programmers. (I can only assume that Mr. Mitchell means to include me in the group `neophyte C programmers'. I dare say he has not read comp.lang.c for the last five years.) It is indeed difficult. Fortunately, I have some experience with this particular explication. >But I wrote this for SPEED and COMPACTNESS. You have mistaken `compact source representation' for `compact object code'. I imagine that most compilers will produce identical object code for the three-statement version as for the one-statement version--- provided, of course, that they happen to evaluate ((rb = ...) * rb) such that the assignment is done before the second evaluation of rb. The 4BSD VAX PCC, at least, produces the same code, since its notion of expression tree evaluation is such that it does stores immediately, so as to free up registers that may otherwise be tied down holding address locations. (It is worth noting that the proposed C standard is not yet widespread ---not merely because it is not yet a standard---and many compilers are not even constrained to avoid expression rearrangement of the sort that breaks (a-1.0)*b.) Incidentally, if you want real speed, you should consider using Rokicki's algorithm, with expansion (the loop runs only 16 times, so might as well go all the way to 16 expansions and no loop) and perhaps coded in assembly (you can probably manage r>>=1 maybe-or-t in fewer instructions than your compiler uses). Then too, if the machine has a `find most significant one bit' instruction (some floating point conversion instructions will do this for you), you can skip some of the 16 steps as well. In any case, avoiding multiplies will help, since integer multiply is fairly difficult. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris