Path: utzoo!attcan!uunet!munnari.oz.au!cs.mu.oz.au!ok
From: ok@cs.mu.oz.au (Richard O'Keefe)
Newsgroups: comp.lang.prolog
Subject: Re: Floating Point
Message-ID: <2565@munnari.oz.au>
Date: 29 Oct 89 07:25:44 GMT
References: <2491@munnari.oz.au> <36169@srcsip.UUCP> <2524@munnari.oz.au> <19165@brunix.UUCP>
Sender: news@cs.mu.oz.au
Lines: 114

In article <19165@brunix.UUCP>, mj@brunix (Mark Johnson) writes:
> I guess what I really should have said is "couldn't interval arithmetic
> be used to provide _complete_ floating point unification?", where by
> complete I mean that if expressions E1 and E2 denote the same real
> number, then E1 = E2 (or, since 'is' is the numerical equality predicate
> in Prolog, I guess that should be E1 is E2).

No, 'is' is _not_ the numerical equality predicate in Prolog.
The numerical equality predicate in Prolog is '=:='.
Furthermore, what numerical equality does and what unification does
are two separate things.  For example, 1 =:= 1.0, but it would be a
serious blunder to have 1 = 1.0.  (Yes, that means that I regard the
treatment of numbers in C Prolog as a serious blunder.)

The popular mistake (the BSI/ISO committee keep making it too) is to
talk about REAL numbers.  The real numbers are a particular algebraic
structure with lots of nice properties.  It makes sense to consider
the integers as a subring of the field of reals.

But there are very few programming languages which provide an approximation
to the reals.  (I've heard of two.)  What programming languages give you is
FLOATING-POINT numbers.  You can't regard the integers as a subring of the
floats because (a) the floats aren't a ring to start with, and (b) in some
valid implementations the machine integers cannot be embedded in the machine
floats (consider Fortran 77).

Each different floating-point system provides a different algebra.
Typically, this algebra is not a subalgebra of the field of real
numbers.  To take three examples:
	- the IBM/370 floating point system contains "unnormalised" numbers,
	  so that several different floating-point values may represent the
	  same number.  These different values BEHAVE differently in
	  certain contexts.  Although every IBM/370 float represents a
	  number, floats do not form a total order.
	- the VAX-11 floating point system contains "illegal operands"
	  which do not represent any number.  Apart from illegal operands,
	  VAX floats do form a total order.
	- the IEEE 754 and IEEE 854 standard floating point systems
	  contain "infinity" values and other values which do not represent
	  any number.  IEEE numbers do not form a total order.

Now floating point calculations may be considered as approximations of
'real' calculations, but there is no vagueness or uncertainty about them.
Let X and Y be particular floats, then there is at most one Z such that
Z = X (+) Y.  If Z' is any other float, then it is absolutely certain
that Z' is *not* X (+) Y.  The IEEE 754 and IEEE 854 standards spell out
precisely which values *must* be yielded by any given floating-point
computation.  The IEEE 754 algebra is as definite and unambiguous as
the integers.  There is no approximation involved!  If I ask for
	eps <- scalb(1.0, -50)	% Eps = 2.0**(-50)
	one_plus_eps <- 1.0 (+) eps
	delta <- (one_plus_eps (*) one_plus_eps) (-) 1.0
then in IEEE 754 double precision arithmetic delta is PRECISELY
defined; the value 2*eps which I get back is not an approximation,
it is the only possible correct answer.  If this were a computation
on the reals, the only possible correct answer for delta would be
eps(2+eps).  As it happens, that number can be represented exactly in
IEEE 754 double precision arithmetic and is detectably different from
the value 2eps.  The plain fact of the matter is that the calculation
has one correct answer (2eps) in one algebra (IEEE 754) and
another correct answer (eps(2+eps)) in another algebra (the reals).
A system which claimed to offer IEEE 754 arithmetic and delivered
any other answer than 2eps for this computation would be WRONG.

> Now I'm just a lowly linguist, but it seems to me that the inherent
> inaccuracy in (some) floating point calculations will mean that any
> floating point implementation of real arithmetic will be unsound (in
> the sense that E1 =:= E2 is true even though E1 and E2 don't denote
> the same real number).

It is simply a MISTAKE to think of floating-point numbers as
denoting real numbers.  It is indeed the case that
	1.0 + scalb(1.0, -60) =:= 1.0
must be true in IEEE 754 arithmetic and must be false in the reals, but
all this tells us is that floating-point arithmetic is not the same
algebra as real arithmetic, which we already knew.  A floating-point
system which conforms to the IEEE 754 standard MUST yield as the result
of 1.0 + scalb(1.0, -60) a number which cannot be distinguished in any
way from 1.0, so in this case although E1 and E2 would not evaluate to
the same real number they MUST evaluate to the same floating-point
number, and it would be an error not to regard their floating-point
values as equal.

> On the other hand, as far as I can see, the interval arithmetic
> approach is complete,

If you consider an interval as a pair of constraints on a real number,
interval arithmetic is sound but not complete.  Amongst other things,
if we assume that multiplication and division by powers of 16 are
exact (which they are in VAX, IEEE, and IBM arithmetic), then
	(X * 16.0) / 16.0
will yield X exactly, but interval arithmetic will regard the 16.0s as
imprecise and will return a wider interval than the X interval it started with.
The standard example here is (X+1)*(X+1) - (X*X + 1)
-- here I assume that the interval arithmetic treats integers as precise --
which should yield [2Xlo,2Xhi] but will yield a rather wider interval as
it doesn't realise that the operands are correlated.

The basic point is that the connection between a floating-point calculation
and the "corresponding" real calculation is not something that comes free.
"floating point expression E' yields an acceptable approximation to the
value of real expression E" is not something which any programming language
can guarantee you.  No amount of fiddling with equality is going to do it.
Indeed, fiddling with equality is counterproductive:  there are several
cases where you _can_ get good results from floating-point arithmetic IF
your program is given information about the floating-point system it is
using, including precise comparison of floats.

If you want your floating-point calculations to yield useful answers, you
have to learn some numerical analysis yourself or have that part of your
program written by someone who does know some numerical analysis.  It is
not enough to switch over to interval arithmetic either; that will reduce
the number of inappropriate results you get by reducing the number of
results you get at all.