Path: utzoo!attcan!uunet!munnari.oz.au!cs.mu.oz.au!ok From: ok@cs.mu.oz.au (Richard O'Keefe) Newsgroups: comp.lang.prolog Subject: Re: Floating Point Message-ID: <2565@munnari.oz.au> Date: 29 Oct 89 07:25:44 GMT References: <2491@munnari.oz.au> <36169@srcsip.UUCP> <2524@munnari.oz.au> <19165@brunix.UUCP> Sender: news@cs.mu.oz.au Lines: 114 In article <19165@brunix.UUCP>, mj@brunix (Mark Johnson) writes: > I guess what I really should have said is "couldn't interval arithmetic > be used to provide _complete_ floating point unification?", where by > complete I mean that if expressions E1 and E2 denote the same real > number, then E1 = E2 (or, since 'is' is the numerical equality predicate > in Prolog, I guess that should be E1 is E2). No, 'is' is _not_ the numerical equality predicate in Prolog. The numerical equality predicate in Prolog is '=:='. Furthermore, what numerical equality does and what unification does are two separate things. For example, 1 =:= 1.0, but it would be a serious blunder to have 1 = 1.0. (Yes, that means that I regard the treatment of numbers in C Prolog as a serious blunder.) The popular mistake (the BSI/ISO committee keep making it too) is to talk about REAL numbers. The real numbers are a particular algebraic structure with lots of nice properties. It makes sense to consider the integers as a subring of the field of reals. But there are very few programming languages which provide an approximation to the reals. (I've heard of two.) What programming languages give you is FLOATING-POINT numbers. You can't regard the integers as a subring of the floats because (a) the floats aren't a ring to start with, and (b) in some valid implementations the machine integers cannot be embedded in the machine floats (consider Fortran 77). Each different floating-point system provides a different algebra. Typically, this algebra is not a subalgebra of the field of real numbers. To take three examples: - the IBM/370 floating point system contains "unnormalised" numbers, so that several different floating-point values may represent the same number. These different values BEHAVE differently in certain contexts. Although every IBM/370 float represents a number, floats do not form a total order. - the VAX-11 floating point system contains "illegal operands" which do not represent any number. Apart from illegal operands, VAX floats do form a total order. - the IEEE 754 and IEEE 854 standard floating point systems contain "infinity" values and other values which do not represent any number. IEEE numbers do not form a total order. Now floating point calculations may be considered as approximations of 'real' calculations, but there is no vagueness or uncertainty about them. Let X and Y be particular floats, then there is at most one Z such that Z = X (+) Y. If Z' is any other float, then it is absolutely certain that Z' is *not* X (+) Y. The IEEE 754 and IEEE 854 standards spell out precisely which values *must* be yielded by any given floating-point computation. The IEEE 754 algebra is as definite and unambiguous as the integers. There is no approximation involved! If I ask for eps <- scalb(1.0, -50) % Eps = 2.0**(-50) one_plus_eps <- 1.0 (+) eps delta <- (one_plus_eps (*) one_plus_eps) (-) 1.0 then in IEEE 754 double precision arithmetic delta is PRECISELY defined; the value 2*eps which I get back is not an approximation, it is the only possible correct answer. If this were a computation on the reals, the only possible correct answer for delta would be eps(2+eps). As it happens, that number can be represented exactly in IEEE 754 double precision arithmetic and is detectably different from the value 2eps. The plain fact of the matter is that the calculation has one correct answer (2eps) in one algebra (IEEE 754) and another correct answer (eps(2+eps)) in another algebra (the reals). A system which claimed to offer IEEE 754 arithmetic and delivered any other answer than 2eps for this computation would be WRONG. > Now I'm just a lowly linguist, but it seems to me that the inherent > inaccuracy in (some) floating point calculations will mean that any > floating point implementation of real arithmetic will be unsound (in > the sense that E1 =:= E2 is true even though E1 and E2 don't denote > the same real number). It is simply a MISTAKE to think of floating-point numbers as denoting real numbers. It is indeed the case that 1.0 + scalb(1.0, -60) =:= 1.0 must be true in IEEE 754 arithmetic and must be false in the reals, but all this tells us is that floating-point arithmetic is not the same algebra as real arithmetic, which we already knew. A floating-point system which conforms to the IEEE 754 standard MUST yield as the result of 1.0 + scalb(1.0, -60) a number which cannot be distinguished in any way from 1.0, so in this case although E1 and E2 would not evaluate to the same real number they MUST evaluate to the same floating-point number, and it would be an error not to regard their floating-point values as equal. > On the other hand, as far as I can see, the interval arithmetic > approach is complete, If you consider an interval as a pair of constraints on a real number, interval arithmetic is sound but not complete. Amongst other things, if we assume that multiplication and division by powers of 16 are exact (which they are in VAX, IEEE, and IBM arithmetic), then (X * 16.0) / 16.0 will yield X exactly, but interval arithmetic will regard the 16.0s as imprecise and will return a wider interval than the X interval it started with. The standard example here is (X+1)*(X+1) - (X*X + 1) -- here I assume that the interval arithmetic treats integers as precise -- which should yield [2Xlo,2Xhi] but will yield a rather wider interval as it doesn't realise that the operands are correlated. The basic point is that the connection between a floating-point calculation and the "corresponding" real calculation is not something that comes free. "floating point expression E' yields an acceptable approximation to the value of real expression E" is not something which any programming language can guarantee you. No amount of fiddling with equality is going to do it. Indeed, fiddling with equality is counterproductive: there are several cases where you _can_ get good results from floating-point arithmetic IF your program is given information about the floating-point system it is using, including precise comparison of floats. If you want your floating-point calculations to yield useful answers, you have to learn some numerical analysis yourself or have that part of your program written by someone who does know some numerical analysis. It is not enough to switch over to interval arithmetic either; that will reduce the number of inappropriate results you get by reducing the number of results you get at all.