Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!samsung!uunet!mcsun!hp4nl!cwi.nl!dik
From: dik@cwi.nl (Dik T. Winter)
Newsgroups: comp.arch
Subject: Changing IEEE rounding modes on the fly (was Re: somthing else)
Message-ID: <3751@charon.cwi.nl>
Date: 21 Jun 91 00:59:00 GMT
References: <9106190449.AA02871@ucbvax.Berkeley.EDU> <1991Jun19.165150.2121@shinobu.sgi.com>
Sender: news@cwi.nl
Organization: CWI, Amsterdam
Lines: 82

The issue about rounding modes and whether changing them takes a long time or
not is obviously a bit more intricate than I thought at first.

In a previous posting I gave timings (from the manual) for the 88100, but
apparently the fp pipelines must be flushed before a change to the fp
control register is performed.  So I was wrong, and the 88100 does not
meet the factor 3 criterion.  (Alas, this is not documented, and the flush
must be programmed!)

What disturbs me is that this not only holds for the change of rounding mode,
but also for the way trapping on abnormal (inf, NaN) results is handled.  It
becomes for instance very cumbersome to say: "do not trap on overflow for
the next three instructions", although it can be very valid in the algorithm.
(Eg a place where you can expect an infinity because of division by zero,
but where you know that the next few instructions will handle it.)
Also the 80x87 does not handle this correct in my opinion.

Another issue is precision control as it is present on the 80x87 and the
6888[12].  (Here it is possible to indicate that a result from a single
operation should be rounded to a specific precision rather than the internal
double extended precision.)

There ought to be a distinction between status bits and control bits.  When
executing an instruction in a pipeline the control bits ought to be taken
along.  (If you look at the i860 you will see that the pipe already carries
a lot of information beyond the operands.  I think that carrying the control
information would not take excessive space.)  If the control information is
taken along in the pipe it is easy to change the control word without any
need to flush pipelines, as long as the change is synchronized with the
issue of FP instructions.  Also, changing the control information should
never issue a trap (unlike the 80x87 where unmasking a previous, but masked
exception results in a trap).  The reason is that if you unmask an exception
for only a few instructions, masking it afterwards should not result in a
trap for the intermediate operations!  You knew the exceptions might happen.

The next question is, should a masked exception be noted in the sticky
exception status bits?  IEEE tells me yes.  The reason is clear, you may
want to run a piece of code at full speed and check exceptions afterwards
(although I doubt that full speed can be reached if exceptions do occur).
So to satisfy IEEE needs, yes, exceptions should be noted.  (Anyhow, noted,
but not trapped exceptions, should be seen as an help in debugging, not as
an indication that something is wrong.)

Another question is how to do this if the FP unit is a co-processor (as is
effectively the case on the 88100).  Clearly the setting of the FP control
register ought to be a function of the co-processor, in that way it is
possible to insure that the setting of the register is not executed out
of line with other FP instructions.

An alternative is of course, as David Hough said, to have rounding mode as
part of the instruction (like Motorola does with precision control on the
68040).  Yes, it helps in this case, but not in the masking of exceptions
etc.  Having single instructions to change fields in a control register
would not be as fast as encoding it in the instruction, but would be more
helpful in more cases (eg exceptions).

A disadvantage of all this is of course that the FP pipes must be made
some bits wider, but I really do not think that is a problem.  We can
question whether it is possible to do it in an upward compatible way
in future implementations of current processors.  (SPARC tells us that
a change in the FP SR does not take effect until some cycles afterwards.)
I think it is possible.  Define a new instruction (executing in the FPU)
that sets/clears some designated fields of the SR/CR.  There is no
conflict in the specs.  But aren't we getting a bit CISCY now?  I think
not (oh and: this is all valid for CISC processors too).

It has been argued that you might be able to lump a whole lot of code
together such that you can reduce the number of settings of the rounding
mode.  E.g. the loop
	for(i=0;i<n;i++) a[i] = b[i] + c[i];
(where a, b and c are intervals and + is the interval addition) might
be split in two loops, one to calculate the lower bounds and one to
calculate the upper bounds.  This is true in a number of cases, but fails
on:
	for(i=0;i<n;i++) a[i] = (b[i] + c[i]) * d[i];
because for instance calculating the lower bound in a multiplication can
involve both the lower bounds and the upper bounds of the operands.

All this is of course moot if fp operations are not pipelined!
--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl