Path: utzoo!utgpu!watserv1!watmath!att!tut.cis.ohio-state.edu!cs.utexas.edu!uunet!convex!garzione From: garzione@convex.com (Michael Garzione) Newsgroups: comp.lang.c Subject: Re: Inherent imprecision of floating point variables Message-ID: <103379@convex.convex.com> Date: 26 Jun 90 12:53:38 GMT References: <3300@crash.cts.com> Sender: usenet@convex.com Lines: 50 rond@pro-grouch.cts.com (Ron Dippold) writes: >In-Reply-To: message from pariyana@hawk.ulowell.edu >> I have a question regarding floating point variables. I have a >> program that requires exact precision in all computations. >> Floating point variables have given me trouble. For example, when >> I enter 32.14 into a floating point variable, then print it out, it >> prints 32.139998. This problem is magnified in all computations. > >Check out the types that your compiler supports. Many support a "double >float" or something similar with much more precision. Some also support IEEE >floating-point numbers, which use 80 bits. >UUCP: crash!pro-grouch!rond >ARPA: crash!pro-grouch!rond@nosc.mil >INET: rond@pro-grouch.cts.com You will never get "exact" precision with any floating point computation if any of the numbers used cannot be represented in base 2 exactly (assuming the exponent in the floating point number is represented as a power of 2 and the mantissa is also in base 2 and of finite length) The error you see above has three sources. (1) 32.14 (base 10) is not an exact power of 2, so does not have an exact representation as a floating point number (2) If you say something like "float i = 32.14" you get another potential error (in this case it's just a result of #1) as the input routines convert a base 10 _printed_ representation of a number to its binary equivalent (base 2 mantissa and exponent) (3) Now that you've got an approximation of 32.14, printing it out gets still another potential error converting binary representations to base 10 to print. (From Knuth, Vol II, p. 200 : "Note: Since floating point arithmetic is inherently approximate...") If you _really_ need *exact* precision you may want to consider using a representation where non-integral numbers are represented as the ratio of two integral numbers and use multiple precision arithmetic on them. You'll be guaranteed of exact precision in all calculations, but it's probably not worth the hassle. Hope this helps. Mike Convex Computer Corporation {uunet,sun}!convex!garzione garzione@convex.com