Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!think.com!mintaka!bloom-beacon!eru!hagbard!sunic!sics.se!fuug!news.funet.fi!tukki.jyu.fi!euler!tt
From: tt@euler.jyu.fi (Tapani Tarvainen)
Newsgroups: comp.sys.hp
Subject: Re: 68040 and Floats, is this true?
Summary: alas, it is true
Message-ID: <TT.91Jun11121823@euler.jyu.fi>
Date: 11 Jun 91 10:18:23 GMT
References: <1991Jun07.213219.14174@lynx.CS.ORST.EDU>
	<TT.91Jun9113646@euler.jyu.fi>
Sender: news@jyu.fi (News articles)
Organization: University of Jyvaskyla
Lines: 130
In-Reply-To: tt@euler.jyu.fi's message of Sun, 9 Jun 1991 09: 36:46 GMT
Originator: tt@euler.jyu.fi
Nntp-Posting-Host: euler.jyu.fi

In article <TT.91Jun9113646@euler.jyu.fi> I wrote:


>In article <1991Jun07.213219.14174@lynx.CS.ORST.EDU> curt@OCE.ORST.EDU (Curt Vandetta) writes:

>> A couple of days ago, I read an article (Sorry I lost it) that someone
>> here on the net wrote about thier experience with the 68040 upgrade on
>> the HP 9000/400t.  I currently have a 68040 upgrade kit sitting on my
>> desking waiting for HP-UX 7.05.  Is it true that the Floating Point
>> performance suffers as much as the previous post indecated?  I have a
>> really uneasy feeling that it is true.

>Floating Point performance suffers!?  I'd say the question is how much
>it improves ... our experience from the 400t -> 425t upgrade is that
>floating-point intensive programs are speeded up by a factor ranging
>from around two to almost seven.

The original article referred to above arrived here today, and I must
report that I got similar results: the '040 IS much slower with
certain operations.  In particular, *printf()ing floating point
numbers is sloooow.

I dug out HP-UX 7.05 Release Notes, which gives a list of operations
the '040 can't do and which are therefore emulated in software.
I've copied the relevant part here.
(I guess this is technically copyrighted material, but I feel this is
a justified copyright-slaughter if there ever was one.)

! Because there was not enough space on the chip, some instructions were
! chosen to be emulated in software.  That is, instead of having the
! instruction interpreted by the hardware directly, a software trap is taken
! into the kernel, and software in the kernel does the requested operation.
! Because they are done in software, the algorithms used may be slightly
! different than the algorithms that would have been used on the 68882.
! Thus, there are differences in the results of the same instruction on the
! 68882 and 68040.
! 
! Differing results are typically measured in "Unit Last Place's" (ULP's),
! which indicates the distance between the true mantissa and the one
! calculated.  For example, if the real mantissa is 0x4572 and the
! calculated mantissa is 0x456E, the difference is 4 ULP's.
! 
! The MC68882 documentation states that "in general, the worst-case accuracy
! of any transcendental function is one unit in the last place of double
! precision."  The software that emulates these instructions is designed to
! give the same accuracy. This means that, on average, the double precision
! representation should be within one ULP of the true value. This does not
! mean that the 68882 and the 68040 give identical results, only that they
! both should be close to the desired value.
! 
! Emulated Instructions
! ---------------------
! The instructions which are emulated in software are given below.
! Instructions marked with a (*) return exact results, the others are within
! one ULP in double precision.
! 
! 	Instr.	 Description		HP-UX Usage
! 	-------------------------------------------------------------
! 	Trig Functions
! 	 fcos	 Cosine			libm, inline Fortran/C
! 	 facos	 Arc Cosine		libm, inline Fortran/C
! 	 fsincos Sine and Cosine
! 	 ftan	 Tangent		libm, inline Fortran/C
! 	 fsin	 Sine			libm, inline Fortran/C
! 	 fasin	 Arc Sine		libm, inline Fortran/C
! 	 fatan	 Arc Tangent		libm, inline Fortran/C
! 
! 	Hyperbolic Functions
! 	 fsinh	 Hyperbolic Sine	libm, inline Fortran/C
! 	 fcosh	 Hyperbolic Cosine	libm, inline Fortran/C
! 	 ftanh	 Hyperbolic Tangent	libm, inline Fortran/C
! 	 fatanh	 Arc Hyper Tangent
! 
! 	Exponential Functions
! 	 flog2	 Log base 2
! 	 flog10	 Log base 10		libm, inline Fortran/C
! 	 flogn	 Log base e		libm, inline Fortran/C
! 	 flognp1 Log base e of (x+1)
! 	 ftwotox 2 to the x
! 	 ftentox 10 to the x
! 	 fetox	 e to the x		libm, inline Fortran/C
! 	 fetoxm1 e to the (x-1)
! 
! 	Utility Functions
! 	 fint	 Integer Part (*)	Fortran Library
! 	 fintrz	 Same, Round Zero (*)	All Compiled Code using floats
! 	 fgetexp Get Exponent (*)
! 	 fgetman Get Mantissa (*)
! 	 frem	 IEEE Remainder
! 	 fscale	 Scale Exponent
! 	 fmod	 Modulo Remainder	Fortran Library
! 
! 
! Unsupported Data Types
! ----------------------
! Besides the emulated instructions discussed above, the MC68040 does not
! have support for any kind of denormalized numbers on the chip.  This
! included denormalized single and double precision numbers, as well as the
! less common denormalized extended precision. In order to handle these
! types, a software trap is taken into the kernel when these data types are
! encountered.
! 
! A denormalized number is a smaller number than could normally be
! represented.  These are included to extend the range around zero.  Since
! they are minority, and since the data type handler can do exactly what the
! 68882 can do (that is, answers between the two chips should be the same),
! this should not cause any problems for most users.  Because of the trap
! and emulate, dealing with denormalized numbers will be much slower than
! dealing with normalized numbers.
! 
! Another data type which is not supported is packed decimal.  Packed
! decimal is used to convert from binary floating point formats to the usual
! decimal form.  This type is used by scanf() and printf() to input and
! output floating point numbers.  Since the emulator uses the same algorithm
! that the 68882 used, the two chips should give the same result.


Some comments: Cursory testing suggests that for the most part the
emulation is quite effective.  In particular, trigs and logs appear
significantly faster on the 040 even though it's emulating them in
software.

The critical thing in the present case is, I think, revealed in
the last paragraph I quoted above: packed decimal support.

HP: PLEASE do something about this.  If you can't speed up
the packed decimal support emulation then try to rewrite
*printf() and *scanf() without them.
--
Tapani Tarvainen    (tarvaine@jyu.fi, tarvainen@finjyu.bitnet)