Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!rutgers!clyde!cbatt!ihnp4!houxm!houxk!houxs!daw
From: daw@houxs.UUCP (D.WOLVERTON)
Newsgroups: comp.arch
Subject: How does compiled code use the floating point unit?
Message-ID: <394@houxs.UUCP>
Date: Fri, 5-Dec-86 16:15:04 EST
Article-I.D.: houxs.394
Posted: Fri Dec  5 16:15:04 1986
Date-Received: Sat, 6-Dec-86 18:13:38 EST
Organization: AT&T Information Systems, Holmdel NJ
Lines: 64


In some systems, the hardware floating point (fp) unit is _optional_.
The Itty Bitty Machines (IBM) PC is a good example.  From the
point of view of a compiler writer, how does one deal with
that uncertainty? [<--this one's a rhetorical question]

I know of, or can imagine, several flavors of code generation
in the face of this situation:

1)  Code generation emits calls to a floating point library.  
This library checks for the presence of fp hardware, and uses 
the fp hardware it is is present, otherwise it emulates the operation.

2)  Like (1), but the test for fp unit is made before the function
call.  The code is larger, but in the case where the fp unit is
present it is faster because not function call was performed.

3)  Code generation pretends that the fp unit will always be present,
so it emits code which uses the fp unit directly into the instruction
stream.  If a fp unit is not present, the hardware arranges for a trap
to occur which transfers control to the OS.  At this point either:

	a)  The OS recognizes that a fp operation was intended,
	and completes the operation by executing its own emulation
	code.  Control is then transferred back to the user code.

	b)  The OS recognizes that a fp operation was intended,
	and calls a special fp emulation entry point in the user code.  
	When the function which emulates the fp operation is finished,
	it transfers control back to the user code.

4)  Code generation emits code which always causes transfer to the
OS, e.g. by illegal opcodes or TRAP instructions.  The OS then
proceeds like (3a) or (3b) above except that the fp unit may be used 
if present.

I like (3) the best.  In the case where a fp unit is present, the
performance is no worse than if it was assumed that the fp unit would
_always_ be present.  If a fp unit is not present, the user's code
will still execute, but more slowly.  Furthermore, the user can
upgrade his floating point performance by adding the fp unit, without
re-compiling his code.

(3a) has the slight additional advantage over (3b) that the user programs
will be smaller because they do not have to carry the baggage of a fp 
emulation library.

However, (3) also requires that the fp unit architecture is known a priori.
It also does not account for a need to support more than one incompatible
fp unit.

Now the questions:

	Are there other scenarios in use?

	Anyone have a different choice for "best"?  Why?

	Which is "best" if more than one fp unit must be supported, or
	if the architecture of the fp unit is not known a priori?

===================================================================
David Wolverton
...!ihnp4!houxs!daw		AT&T Information Systems, Holmdel