Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!ames!ptsfa!ihnp4!inuxc!pur-ee!j.cc.purdue.edu!h.cc.purdue.edu!ad3 From: ad3@h.cc.purdue.edu.UUCP Newsgroups: comp.lang.c Subject: Re: C and Floating Point Message-ID: <3406@h.cc.purdue.edu> Date: Mon, 6-Apr-87 16:35:32 EST Article-I.D.: h.3406 Posted: Mon Apr 6 16:35:32 1987 Date-Received: Wed, 8-Apr-87 02:20:57 EST References: <15958@sun.uucp> <5716@brl-smoke.ARPA> <14681@cca.CCA.COM> Reply-To: ad3@h.cc.purdue.edu.UUCP (Mike Brown) Organization: Purdue University Computing Center Lines: 69 Keywords: C Fortran Floating Point In article <14681@cca.CCA.COM> g-rh@CCA.CCA.COM.UUCP (Richard Harter) writes: >One class of usage for floating point equality has been suggested, >detection of exact bit patterns. I suggest that this is not advisable, >both in principle and in practice. I say 'in principle' because you >are using an operation with one set of semantics to simulate an >operation with a different set of semantics. I say 'in practice' >because you can get unexpected results if the bit patterns are >equivalent to unnormalized floating point numbers. I've had the "opportunity" to track down and fix exactly this problem in two major statistical packages that we run on our CDC 6000 systems. The tale related here is a Fortran rather than a C example. But it could easily happen with any language. Both packages were originally developed on IBM systems in the days before Fortran77. In those days, Fortran didn't have character variables, so character data had to be stored in numeric variables. In IBM-land, this data often ended up in REAL-typed variables. Floating-point comparison of character data may work on IBM systems (I'm not intimately familiar with IBM data representations and operations), but it can be a problem on the CDC 6000. A little background follows... The CDC 6000 has a 60-bit word size and a 6-bit character size. The character set includes "A"-"Z" (01-32 octal), "0"-"9" (33-44 octal), blank (55 octal), and a number of other special characters. Floating point format uses the upper 12 bits (2 characters) for the exponent and the lower 48 bits (8 characters) for the mantissa. The problem described here was in the package's command language decoding. The command language input is broken into tokens, which were stored left-justified with blank fill in floating point variables. Parsing the program naturally includes checking these tokens to try to recognize command language keywords. One of these keywords "TO" and the user had a variable named "TRE". These are stored internally as: 24 17 55 55 55 55 55 55 55 55 TO 24 22 05 55 55 55 55 55 55 55 TRE (All bytes are in octal.) So, how do we compare floating point quantities? The compiler can't assume that the data will be normalized, so it can't generate code to do a bitwise comparison. Instead, the generated code subtracts one "number" from the other and compares the result to 0. In doing the subtraction, the hardware adjusts the number with the smaller exponent so that the exponents match. This exponent adjustment must be compensated by shifting the mantissa so that the adjusted number has the same value. So what happens in this particular case? "TO" has the smaller exponent (24 17), so its exponent is incremented by 3, making it match the other (24 22). This must be compensated by shifting the mantissa (55 55 55 ...) right 3 bits, making it (05 55 55 ...). Putting it all together, the hardware has transformed "TO" into "TRE", and they'll naturally compare equal. Several general points should be made here: - Sometimes the base data types provided by a language don't match the application's view of the data, and you have to choose one of the available data types. - You should be aware of the limitations of the various forms of data representation, so that you can make a good choice. ====================================================================== -- Mike Brown, Systems Programmer ARPANET: ad3@j.cc.Purdue.EDU Purdue University Computing Center BITNET: AD3@PURCCVM Mathematical Sciences Building USENET: ad3@pucc-j.UUCP West Lafayette, IN 47907 Phone: (317) 494-1787