Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!cornell!uw-beaver!tera.com!bob From: bob@tera.com (Bob Alverson) Newsgroups: comp.arch Subject: Re: Killer Micro II Message-ID: <1990Aug31.160357.19057@tera.com> Date: 31 Aug 90 16:03:57 GMT References: <2482@l.cc.purdue.edu> <2868@inews.intel.com> Sender: news@tera.com Reply-To: bob@colossus.tera.com.UUCP (Bob Alverson) Organization: Tera Computer Company Lines: 37 In article <2868@inews.intel.com> jsweedle@mipos2.UUCP (Jonathan Sweedler) writes: >This is probably coming from Prof. Kahan via IBM. From personal talks with >Prof. Kahan and from some postings to the numeric interests mailing list, >it seems that Prof. Kahan's next crusade is to convince people that IEEE >double precision won't be good enough for future software. He feels that >problems tend to grow as systems tend to grow (more memory and become >faster). As problems grow, more accuracy is needed. As David Hough wrote, >in a letter to the numerics interest group, more precision is needed to: One way to get "128" bit precision is with a pair of 64 bit FP numbers. Kahan calls this "doubled" precision. The IBM RS6000 has some support for this, with their single-round multiply-add. You can find the exact product of two doubles a, b as P = round(a*b), p = a*b - P. In their technology book, they show how to do the product (A,a)*(B,b). However, there is no discussion there of how to do "doubled" precision adds. You can do them without any special support functions, but Kahan has advocated special hardware to make it faster. One way is to use a triple-add with only a single round (from Kahan): doubled operator+(doubled a, doubled b) { doubled sum; double t1 = a.lo + b.lo; double t2 = add3(t1, a.hi, b.hi); double t3 = add3(a.hi, b.hi, -t2); double t4 = add3(t3, a.lo, b.lo); sum.hi = t2 + t4; sum.lo = add3(t2, t4, -sum.hi); return sum; } I think there are other ways to make "doubled" adds go fast, but I don't want to toot my own horn until I'm sure its on pitch. One nice thing about extending precision by using a pair of 64 bit floats is that it extends infinitely to an arbitrary n-tuple of floats. Bob