Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!ames!ptsfa!ihnp4!inuxc!pur-ee!j.cc.purdue.edu!h.cc.purdue.edu!ad3
From: ad3@h.cc.purdue.edu.UUCP
Newsgroups: comp.lang.c
Subject: Re: C and Floating Point
Message-ID: <3406@h.cc.purdue.edu>
Date: Mon, 6-Apr-87 16:35:32 EST
Article-I.D.: h.3406
Posted: Mon Apr  6 16:35:32 1987
Date-Received: Wed, 8-Apr-87 02:20:57 EST
References: <15958@sun.uucp> <5716@brl-smoke.ARPA> <14681@cca.CCA.COM>
Reply-To: ad3@h.cc.purdue.edu.UUCP (Mike Brown)
Organization: Purdue University Computing Center
Lines: 69
Keywords: C Fortran Floating Point

In article <14681@cca.CCA.COM> g-rh@CCA.CCA.COM.UUCP (Richard Harter) writes:
>One class of usage for floating point equality has been suggested,
>detection of exact bit patterns.  I suggest that this is not advisable,
>both in principle and in practice.  I say 'in principle' because you
>are using an operation with one set of semantics to simulate an
>operation with a different set of semantics.  I say 'in practice'
>because you can get unexpected results if the bit patterns are
>equivalent to unnormalized floating point numbers.

I've had the "opportunity" to track down and fix exactly this problem
in two major statistical packages that we run on our CDC 6000 systems.
The tale related here is a Fortran rather than a C example.  But it
could easily happen with any language.

Both packages were originally developed on IBM systems in the days
before Fortran77.  In those days, Fortran didn't have character
variables, so character data had to be stored in numeric variables.  In
IBM-land, this data often ended up in REAL-typed variables.

Floating-point comparison of character data may work on IBM systems
(I'm not intimately familiar with IBM data representations and
operations), but it can be a problem on the CDC 6000.  A little
background follows...  The CDC 6000 has a 60-bit word size and a 6-bit
character size.  The character set includes "A"-"Z" (01-32 octal),
"0"-"9" (33-44 octal), blank (55 octal), and a number of other special
characters.  Floating point format uses the upper 12 bits (2
characters) for the exponent and the lower 48 bits (8 characters) for
the mantissa.

The problem described here was in the package's command language
decoding.  The command language input is broken into tokens, which were
stored left-justified with blank fill in floating point variables.
Parsing the program naturally includes checking these tokens to try to
recognize command language keywords.  One of these keywords "TO" and
the user had a variable named "TRE".  These are stored internally as:
    24 17 55 55 55 55 55 55 55 55    TO
    24 22 05 55 55 55 55 55 55 55    TRE
(All bytes are in octal.)

So, how do we compare floating point quantities?  The compiler can't
assume that the data will be normalized, so it can't generate code to
do a bitwise comparison.  Instead, the generated code subtracts one
"number" from the other and compares the result to 0.  In doing the
subtraction, the hardware adjusts the number with the smaller exponent
so that the exponents match.  This exponent adjustment must be
compensated by shifting the mantissa so that the adjusted number has
the same value.

So what happens in this particular case?  "TO" has the smaller exponent
(24 17), so its exponent is incremented by 3, making it match the other
(24 22).  This must be compensated by shifting the mantissa (55 55 55
...) right 3 bits, making it (05 55 55 ...).  Putting it all together,
the hardware has transformed "TO" into "TRE", and they'll naturally
compare equal.

Several general points should be made here:
- Sometimes the base data types provided by a language don't match the
  application's view of the data, and you have to choose one of the
  available data types.
- You should be aware of the limitations of the various forms of data
  representation, so that you can make a good choice.


======================================================================
-- 
Mike Brown, Systems Programmer		ARPANET: ad3@j.cc.Purdue.EDU
Purdue University Computing Center	BITNET:  AD3@PURCCVM
Mathematical Sciences Building		USENET:  ad3@pucc-j.UUCP
West Lafayette, IN 47907		Phone:   (317) 494-1787