Path: utzoo!attcan!uunet!lll-winken!lll-tis!helios.ee.lbl.gov!pasteur!ucbvax!decwrl!sun!chiba!khb From: khb%chiba@Sun.COM (Keith Bierman - Sun Tactical Engineering) Newsgroups: comp.arch Subject: Re: RISC v. CISC --more misconceptions Message-ID: <75772@sun.uucp> Date: 2 Nov 88 11:24:51 GMT References: <156@gloom.UUCP> <18931@apple.Apple.COM> <40@sopwith.UUCP> <19762@apple.Apple.COM> <1002@l.cc.purdue.edu> Sender: news@sun.uucp Reply-To: khb@sun.UUCP (Keith Bierman - Sun Tactical Engineering) Organization: Sun Microsystems, Mountain View Lines: 90 In article <1002@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: >> Of course there are applications that are integer multiplication >> intensive (as opposed to floating point). >> What I did say is that they are quite rare. > >They are rare because a good programmer knows that they are slow and >difficult to program. Integer multiplication hard to program ? Slow ? Is this really what is meant ? > >> Integer floating point intensive is defined (here and now, by me) to >> be an application that will suffer a performance degradation of more >> than 3% without a fast hardware multiplier (2-3 cycles, vs. the >> average 11 cycles that HP can do in pure software. (A back of the >> envelope calculation will show that means .3%- pretty high for >> multiply) Most integer multiplies that I am aware of are used for >> index scaling and other address calculations. Good optimizing >> compilers will strength reduce these away > >If the double-precision product of two single-precision integers is required, >and only single-precision products are available, it is necessary to go to >single-precision products of half-precision numbers. This takes about 20 >instructions. How does the poster expect to do it in an average of 11 cycles? >Many of these jobs are not being done, or are being kludged by finding ways to >accomplish more-or-less the same results in 10 instructions. And if a >subroutine call is made, double the time. I belive the poster is refering to numerious HP publications (open lit and manuals). Their algorithm is quite clever and makes use of the fact that certain multipliers are much more common than others. Special instructions in Spectrum are employed in conjuction with delayed branches to perform multiply in a max of 11 cycles (and often one wins and it is less). I do not think that a discussion of DP was meant. > >Many mathematical computations should be made in fixed-point arithmetic. NO! Having witnessed far too much of this in DoD embedded computers. The fixed point math saved some hardware; but the software was awful. Because of the handsprings fixedpoint math required, it was not possble to focus on the real big issues (like is this algorithm numerically stable ? Can we cut the compute cost by a factor of 10 by altering the problem ?). Fixed point math is sometimes appropriate, but if anything too much is done in fixed point. If >one does not have the hardware available, the cost is much greater than >floating point. If the hardware is available, it is much cheaper. None >of the major languages support fixed point. So none of the hardware gurus >put it in, so none of the machines have it, so no one programs in it, so >the inclusion of it is objected to as a waste of resources, etc. Check out DSP chips. Check out generation after generation of military chips. fixed point is very common in some envirnoments. PL/1 PL/"x" (dialects) and misc special mil languages support it. People code in it, and it is usually a bad design choice. > >Another hardware operation missing on most machines is square root. So one >does not use algorithms requiring square roots. Well, I came from the world of kalman filtering where the best algorithms tend to use square-roots (though sometimes these can be avoided). sqrt improves numerical reliability of many algorithms (when employed correctly) which more than makes up for its speed (if you get the right answer in fewer iterations, the extra cost doesn't necessarily matter. I have not done a head count, but many machines have sqrt (8087, 6888x, vax fpa, ibm fpa, univac). > >An application using accurate arithmetic heavily will be spending most of its >time in multiple-precision subroutines, Not if good algorithms are employed. For example, in the early '70's JPL'ers G. Bierman and K. Thornton proved that UD and SRIF mechanized kalman filters could be run in SP (36-bits) rather than the DP (72 bits) required by competing algorithms to acheive superior results. Better algorithm, fewer bits, better results. Better math logic (like ieee machines with extended accumulators) won't spend any time in extended precision routines. vax fpa and IBM fpas also do their math in 64-bits, so that the penalty for extended precision is the cost of moving more bits to memory (and more paging, etc.) Keith H. Bierman It's Not My Fault ---- I Voted for Bill & Opus