Path: utzoo!attcan!uunet!munnari.oz.au!goanna!ok
From: ok@goanna.oz.au (Richard O'keefe)
Newsgroups: comp.arch
Subject: Re: floating point formats -- usage of small floats
Keywords: floating point formats, floating point usage, Lisp, heaps
Message-ID: <2944@goanna.oz.au>
Date: 5 Mar 90 21:11:06 GMT
References: <3880@uceng.UC.EDU> <1990Mar5.031003.12107@Neon.Stanford.EDU>
Organization: Comp Sci, RMIT, Melbourne, Australia
Lines: 62

In article <1990Mar5.031003.12107@Neon.Stanford.EDU>,
 wilson@carcoar.Stanford.EDU (Paul Wilson) writes:
> [ wants to implement short-floats; thinks it would be cool to take bits
>   out of the exponent; wants to know how many ]

> I figure that I can just lop a couple of bits of of the exponent part, and
> leave the fraction part alone.  That way, I sacrifice range of floats
> but not precision.

The snag with that is that you sacrifice *all* the precision at the
ends of your ranges.  The IEEE-754 formats are very carefully balanced;
there are actually rules that tell you what would be good sizes for the
fields.  Obviously, a short-float cannot be IEEE-754, but you *could*
try to see how many of the constraints listed in IEEE-854 you can
satisfy.

To give you some idea of the kind of thinking that is involved:
suppose you have N effective mantissa bits (counting the hidden bit).
You would like nextafter(1.0,/*in the direction of*/ 2.0) - 1.0
(that is, the next number just bigger than 1.0, minus 1.0) to be
representable; that's your epsilon.  So you want 2**(1-N) to be
representable, and it's a general rule that the exponent range should
be roughly symmetric, so you want the range to be at least 1-N..N-1
which means you need approx ceil(log2(N)+1) exponent bits.  For N=24,
that means an absolute minimum of 6 exponent bits.  More than that,
you'd really like an epsilon's worth of epsilon, so you want at least
ceil(log2(N)+2) exponent bits, and now we're just 1 short of 754'8
8 exponent bits.

It is much better to steal bits out of the significand field.
There are several Prolog systems that do it; however that approach
is losing its popularity.  It was ok when you were just entering
co-ordinates for graphics, but for serious calculation anything
less than the usual 32 bits is unsafe in unskilled hands.

If you do decide to steal bits out of the significand field,
make a conscious design decision about whether you want to preserve
the "graceful underflow" property of the IEEE standards or not.  If
you do, you'd have quite a reasonable system, but the conversions
are not as simple.  If you don't, the conversions are simpler, but
the resulting arithmetic system isn't much like IEEE.

That's what most people have talked about.
But it's *not* what Paul Wilson *asked* about.
His proposal is:

> If a float result goes out of that range, I'll make
> a full-sized floating point object on the heap, and use a tagged
> pointer to it.

This is a really neat idea.  I like it a LOT.  He's not actually
proposing to change the properties of the arithmetic system at all,
merely the representation in memory.  (Rather like the VAX tiny-float
immediate operands.)

Presumably DEC picked the range of their tiny-floats based on
some expectation of the range of constants in programs; if your
coding covers that, you're doing at least as well as the VAX.

A suggestion:  why not scan through CALGO (ACM Collected Algorithms)
and issues of JRSS Series C and such things to see what some typical
ranges are like.  I've had ranges of 10**9 or more in my code, but so what?