Path: utzoo!attcan!uunet!lll-winken!lll-tis!helios.ee.lbl.gov!pasteur!ucbvax!decwrl!sun!chiba!khb
From: khb%chiba@Sun.COM (Keith Bierman - Sun Tactical Engineering)
Newsgroups: comp.arch
Subject: Re: RISC v. CISC --more misconceptions
Message-ID: <75772@sun.uucp>
Date: 2 Nov 88 11:24:51 GMT
References: <156@gloom.UUCP> <18931@apple.Apple.COM> <40@sopwith.UUCP> <19762@apple.Apple.COM> <1002@l.cc.purdue.edu>
Sender: news@sun.uucp
Reply-To: khb@sun.UUCP (Keith Bierman - Sun Tactical Engineering)
Organization: Sun Microsystems, Mountain View
Lines: 90

In article <1002@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:

>> Of course there are applications that are integer multiplication
>> intensive (as opposed to floating point).
>> What I did say is that they are quite rare.
>
>They are rare because a good programmer knows that they are slow and
>difficult to program.  

Integer multiplication hard to program ? Slow ? Is this really what is
meant ?

>
>> Integer floating point intensive is defined (here and now, by me) to
>> be an application that will suffer a performance degradation of more
>> than 3% without a fast hardware multiplier (2-3 cycles, vs. the
>> average 11 cycles that HP can do in pure software. (A back of the
>> envelope calculation will show that means .3%- pretty high for
>> multiply) Most integer multiplies that I am aware of are used for
>> index scaling and other address calculations. Good optimizing
>> compilers will strength reduce these away
>
>If the double-precision product of two single-precision integers is required,
>and only single-precision products are available, it is necessary to go to
>single-precision products of half-precision numbers.  This takes about 20
>instructions.  How does the poster expect to do it in an average of 11 cycles?
>Many of these jobs are not being done, or are being kludged by finding ways to
>accomplish more-or-less the same results in 10 instructions.  And if a
>subroutine call is made, double the time.

I belive the poster is refering to numerious HP publications (open lit
and manuals). Their algorithm is quite clever and makes use of the
fact that certain multipliers are much more common than others.
Special instructions in Spectrum are employed in conjuction with
delayed branches to perform multiply in a max of 11 cycles (and often
one wins and it is less). I do not think that a discussion of DP was meant.
>
>Many mathematical computations should be made in fixed-point arithmetic.  

NO! Having witnessed far too much of this in DoD embedded computers.
The fixed point math saved some hardware; but the software was awful.
Because of the handsprings fixedpoint math required, it was not
possble to focus on the real big issues (like is this algorithm
numerically stable ? Can we cut the compute cost by a factor of 10 by
altering the problem ?). Fixed point math is sometimes appropriate,
but if anything too much is done in fixed point.

If
>one does not have the hardware available, the cost is much greater than
>floating point.  If the hardware is available, it is much cheaper.  None
>of the major languages support fixed point.  So none of the hardware gurus
>put it in, so none of the machines have it, so no one programs in it, so
>the inclusion of it is objected to as a waste of resources, etc.

Check out DSP chips. Check out generation after generation of military
chips. fixed point is very common in some envirnoments. PL/1 PL/"x"
(dialects) and misc special mil languages support it. People code in
it, and it is usually a bad design choice.

>
>Another hardware operation missing on most machines is square root.  So one
>does not use algorithms requiring square roots.

Well, I came from the world of kalman filtering where the best
algorithms tend to use square-roots (though sometimes these can be
avoided). sqrt improves numerical reliability of many algorithms (when
employed correctly) which more than makes up for its speed (if you get
the right answer in fewer iterations, the extra cost doesn't
necessarily matter. I have not done a head count, but many machines
have sqrt (8087, 6888x, vax fpa, ibm fpa, univac).

>
>An application using accurate arithmetic heavily will be spending most of its
>time in multiple-precision subroutines,

Not if good algorithms are employed. For example, in the early '70's JPL'ers G.
Bierman and K. Thornton proved that UD and SRIF mechanized kalman
filters could be run in SP (36-bits) rather than the DP (72 bits)
required by competing algorithms to acheive superior results. Better
algorithm, fewer bits, better results. 

Better math logic (like ieee machines with extended accumulators)
won't spend any time in extended precision routines. vax fpa and IBM fpas
also do their math in 64-bits, so that the penalty for extended
precision is the cost of moving more bits to memory (and more paging, etc.)


Keith H. Bierman
It's Not My Fault ---- I Voted for Bill & Opus