Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!samsung!uunet!mcsun!hp4nl!cwi.nl!dik From: dik@cwi.nl (Dik T. Winter) Newsgroups: comp.arch Subject: Changing IEEE rounding modes on the fly (was Re: somthing else) Message-ID: <3751@charon.cwi.nl> Date: 21 Jun 91 00:59:00 GMT References: <9106190449.AA02871@ucbvax.Berkeley.EDU> <1991Jun19.165150.2121@shinobu.sgi.com> Sender: news@cwi.nl Organization: CWI, Amsterdam Lines: 82 The issue about rounding modes and whether changing them takes a long time or not is obviously a bit more intricate than I thought at first. In a previous posting I gave timings (from the manual) for the 88100, but apparently the fp pipelines must be flushed before a change to the fp control register is performed. So I was wrong, and the 88100 does not meet the factor 3 criterion. (Alas, this is not documented, and the flush must be programmed!) What disturbs me is that this not only holds for the change of rounding mode, but also for the way trapping on abnormal (inf, NaN) results is handled. It becomes for instance very cumbersome to say: "do not trap on overflow for the next three instructions", although it can be very valid in the algorithm. (Eg a place where you can expect an infinity because of division by zero, but where you know that the next few instructions will handle it.) Also the 80x87 does not handle this correct in my opinion. Another issue is precision control as it is present on the 80x87 and the 6888[12]. (Here it is possible to indicate that a result from a single operation should be rounded to a specific precision rather than the internal double extended precision.) There ought to be a distinction between status bits and control bits. When executing an instruction in a pipeline the control bits ought to be taken along. (If you look at the i860 you will see that the pipe already carries a lot of information beyond the operands. I think that carrying the control information would not take excessive space.) If the control information is taken along in the pipe it is easy to change the control word without any need to flush pipelines, as long as the change is synchronized with the issue of FP instructions. Also, changing the control information should never issue a trap (unlike the 80x87 where unmasking a previous, but masked exception results in a trap). The reason is that if you unmask an exception for only a few instructions, masking it afterwards should not result in a trap for the intermediate operations! You knew the exceptions might happen. The next question is, should a masked exception be noted in the sticky exception status bits? IEEE tells me yes. The reason is clear, you may want to run a piece of code at full speed and check exceptions afterwards (although I doubt that full speed can be reached if exceptions do occur). So to satisfy IEEE needs, yes, exceptions should be noted. (Anyhow, noted, but not trapped exceptions, should be seen as an help in debugging, not as an indication that something is wrong.) Another question is how to do this if the FP unit is a co-processor (as is effectively the case on the 88100). Clearly the setting of the FP control register ought to be a function of the co-processor, in that way it is possible to insure that the setting of the register is not executed out of line with other FP instructions. An alternative is of course, as David Hough said, to have rounding mode as part of the instruction (like Motorola does with precision control on the 68040). Yes, it helps in this case, but not in the masking of exceptions etc. Having single instructions to change fields in a control register would not be as fast as encoding it in the instruction, but would be more helpful in more cases (eg exceptions). A disadvantage of all this is of course that the FP pipes must be made some bits wider, but I really do not think that is a problem. We can question whether it is possible to do it in an upward compatible way in future implementations of current processors. (SPARC tells us that a change in the FP SR does not take effect until some cycles afterwards.) I think it is possible. Define a new instruction (executing in the FPU) that sets/clears some designated fields of the SR/CR. There is no conflict in the specs. But aren't we getting a bit CISCY now? I think not (oh and: this is all valid for CISC processors too). It has been argued that you might be able to lump a whole lot of code together such that you can reduce the number of settings of the rounding mode. E.g. the loop for(i=0;i