Path: utzoo!mnetor!geac!torsqnt!tmsoft!robohack!eci386!clewis
From: clewis@eci386.uucp (Chris Lewis)
Newsgroups: can.usrgroup
Subject: Re: C code (fwd)
Message-ID: <1990Mar1.165859.29471@eci386.uucp>
Date: 1 Mar 90 16:58:59 GMT
References: <9002280838.AA01415@cohort.uucp>
Reply-To: clewis@eci386.UUCP (Chris Lewis)
Distribution: ont
Organization: Elegant Communications Inc., Toronto, Canada
Lines: 51

In article <9002280838.AA01415@cohort.uucp> Steve Bird <steve@cohort.UUCP> writes:
|          float a,b;
|          b = 2.0e20 + 1.0;
|          a = b - 2.0e20;
|          printf("%f \n",a);
|
|   When compiled the program returns the number 4008175468544.000000 .
|   Now when the program is modified to read :
|
|          float a,b;
|          b = 2.0e20 + 1.0;
|          a = 2.0e20;
|          printf("%f \n",b - a);
|
|   The program returns 0.000000 . Why ?

Actually, it's truncation error ;-)

The numbers printed in the above cases frequently depend upon the precise
C compiler and processor you're running on.  The explanation I give is
"traditional" C, according to K&R (ANSI C provides for different behaviour
under certain circumstances):

When a float is passed to a function or used in an expression, the operand
is first coerced to a double.  Eg: the subtraction in the first fragment
has both arguments coerced to double, and then the result is forced into
a float.  Since floats are usually half the size of doubles, you lose
digits off the least significant end.

In the second fragment, the subtraction is also done with both as doubles,
but since it is being used as a function argument, it is not truncated
into a float, and it's passed as a double to printf's %f handler.

There are other factors coming into play - depending on your machine,
2e20 + 1 may actually *equal* 2e20, depending upon how many digits of
precision the variable that the result is stored in has.  (Which is what
I suspect that your second example is trying to tell you)

Frankly, given that "traditional" C does all floating point operations
and argument passing in doubles, I almost never use a float to store 
the result of an FP operation, and only use floats in large arrays.
If you use floats for the results of FP operations, the algorithm should
be well understood as to the magnitudes of the operands used.  This 
sort of thing is still possible with doubles, but you can get away with more.

If you make everything double, chances are it'll be faster (less coercing 
required), and only use significant amounts of space in large arrays.
-- 
Chris Lewis, Elegant Communications Inc, {uunet!attcan,utzoo}!lsuc!eci386!clewis
Ferret mailing list: eci386!ferret-list, psroff mailing list: eci386!psroff-list