Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!zaphod.mps.ohio-state.edu!mips!winchester!mash
From: mash@mips.COM (John Mashey)
Newsgroups: comp.arch
Subject: Re: 386 Clones [really: IEEE floating point & various approaches; long]
Message-ID: <42597@mips.mips.COM>
Date: 1 Nov 90 02:56:12 GMT
References: <1990Oct26.015244.586@amd.com> <8464@scolex.sco.COM> <2816@crdos1.crd.ge.COM> <2451@charon.cwi.nl>
Sender: news@mips.COM
Reply-To: mash@mips.COM (John Mashey)
Organization: MIPS Computer Systems, Inc.
Lines: 132

In article <2451@charon.cwi.nl> dik@cwi.nl (Dik T. Winter) writes:
...
>The true Cray and Convex users do not care about IEEE.  Their machines do
>not have IEEE conformant arithmetic; far from it.  They want results; fast.
>They do not care about correctness.  :-)

Note, also, that this somewhat applies to the IBM RS/6000 series.
Legally, but unlike almost all other IEEE FP implementations,
FP OPERATIONS DO NOT NORMALLY TRAP ON EXCEPTION CONDITIONS;
i.e., SIGFPE doesn't normally do anything.

You choices are:
	a) Run the code in a mode where FP operations are sequentialized,
	which of course seriously degrades performance, but gives
	precise exceptions. (good for debug)
	b) Make explicit calls in your code to routines that test the
	status of the various flags. (probably not favored by programmers,
	especially since it de-portabilizes otherwise-portable code.)
	c) (I'm not sure if this is shipped yet, or not; does anybody KNOW?):
	use compiler switches that
	generate the relevant tests, like at end of statement, end of
	function, or program exit.

There are some reasonable reasons from this, and I attach the relevant
quotes from the IBM J of R & D that describes the RS/6000.
HOWEVER, people may want to be warned that code ported from other machines,
may possibly NOT trap exceptions the way that most other IEEE machines do.
(I would assume that if you trap SIGFPE, that the runitme at least, on
program exit, tells you that there has been an exception "sometime".
If it doesn't do this, I suspect there is a bunch of code out there
that thought it was protecting itself, and didn't.)

Here's what the IBM man page says:
  fp_clr_flag, fp_set_flag, fp_read_flag, or fp_swap_flag 
.....
    Description
  
      The RISC System/6000 currently does not generate an interrupt
  for floating-point exceptions.  Therefore, the common  method  of
  catching  the  signal SIGFPE  and  calling  an  appropriate  trap
  handler to identify a floating-point exception is not supported.
  
      These subroutines aid in determining when an exception has
  occurred and the exception type.  These subroutines can be called
  explicitly  around blocks of code that may cause a floating-point
  exception.

Then, here is some explanation (long):

From IBM J Res & Dev, Vol 34, No 1, January 1990 (IBM RS/6000 issue)

p.33-34 (CAPITALS MINE)
"Another very important aspect of fully exploiting floating-point performance
is the method of presentation of floating-point exceptions and the precision
in identifying the instructions that cause floating-point exceptions.
Exceptions are a natural and perhaps expected consequence of floating-point
operaitons, and most can be handled by default rules.  (Default exception
handling is defined by the IEEE standard.)  These default rules can be
managed completely in hardware and require no program intervention after
initialization.

The IEEE default rules do not always provide the desired result, however.
Since the standard allows for a program fix-up after an exception, the
architecture problem is then to define a mechanism to permit program fixu-up.
The most straightforward approach is to specify that a floating-point
interrupt at the failing instruction will occur whenever there is a
floating-point exception that is not defaulted.  The hardware implication
of this is that all instructions after a floating point instruction must
be conditional until it is know that no exceptions are possible on that
instruction.  Some floating point instructions take many cycles, and exceptions
may not be known until the last cycle of the instruction.  Therefore, most
implementations would serialize on floating-point instructions-if not the
first, then the second; if not all, then some.  The inclusion of a floating-
point interrupt would sacrifice much of the potential floating-point
performance.

AN ALTERNATIVE STRATEGY IS NOT TO REPORT AN INTERRUPT AT ALL, BUT SIMPLY TO
SET A BIT INDICATING THAT A FLOATING-POINT EXCEPTION HAS OCCURRED.  IT IS
THEN UP TO A PROGRAM TO TEST FOR FLOATING-POINT EXCEPTIONS.  Different
compiler strategies can be used as to where it is appropriate to test for
these exceptions.  Since the definition of the exception also includes the
setting of summary information, it is possible to test at the end of a program,
at the end of a subprogram, or at the end of a statement where a floating-point
operation was used.  This level of precision can be controlled by
linker/compiler option.  None of these tell exactly where the exception
occurred; they simply identify where it occurred.  In most cases, this
information is sufficient.

However, if the exact failing instruction must be known, there are two
possible strategies.  One can insert a test for the exception after each
floating-point instruction, or one can tag each queued and/or executing
floating-point instruction with its address.  Inserting code to test for
every possible exception is yet another mode for the compiler to manage,
necessitates recompilation, and can significantly expand execution time.
Address tagging of "active" floating-point instructions identifies the
failing instruction exactly.  However, it does require that the
implementation keep track of the address tags.  Moreover, it is not
synchronous; that is, if an exception occurs, the location of the
failing instruction is reported, but not before the program
has gone beyond that point.  Fix-up may still be possible, but in general
this method only permits localization of the failing instruction.
Consider the case ofthe inner-loop product described in Figure 7.
This loop consists of two floating-point loads, one floating-point
multiply-add, and one branch.  The "active" floating-point instructions
will all be instances of the same multiply-add instruction.  If an
exception occurs, what is know is the address of the instruction, not
the iteration number.  The benefit of this approach is speed; floating-point
performance is not limited by exception recognition.  The drawback, as
outline above, is the precision with which the fault is determined.

RISC System/6000 architecture adopted a two-part strategy.  THE PRINCIPAL
APPROACH WOULD BE TEST-CODE INSERTION, with the compilers able to insert
such code at the statement or (sub)program level.  The linker also
supports the enabling of test code at program exit, ensuring the ability 
to report a floating-point exception if it occurs anywhere within the
program.

To avoid recompilation in order to identify the failing operation exactly,
the architecture also adopted a synchronize mode, in which an interrupt
can be generated, identifying the failing instruction by running the
machine with one floating-point instruction dispatched at a time.  This
technqiue has the same weakness as code insertion; THAT IS, FLOATING-POINT
PERFORMANCE IS GREATLY REDUCED.  However, it may not be as bad as code
insertion, because the synchronization can be managed by hardware rather than
by extra code inserted by software.  It is expected that the mode will only
by extra code inserted by software.  IT IS EXPECTED THAT THE MODE WILL ONLY
BE USED BY CERTAIN PROGRAMS AND THEN ONLY TO DEBUG THEIR ALGORITHMS."
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086