Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!csd4.csd.uwm.edu!bionet!ames!amdcad!sun!road!khb From: khb@road.Sun.COM (Keith Bierman - Advanced Languages - Floating Point Group ) Newsgroups: comp.arch Subject: Re: John von Neumann, sqrt instr Message-ID: <122600@sun.Eng.Sun.COM> Date: 19 Aug 89 01:57:20 GMT References: <21353@cup.portal.com> <25643@obiwan.mips.COM> <1513@l.cc.purdue.edu> <2376@wyse.wyse.com> Sender: news@sun.Eng.Sun.COM Reply-To: khb@sun.UUCP (Keith Bierman - Advanced Languages - Floating Point Group ) Distribution: usa Organization: Sun Microsystems, Mountain View Lines: 44 In article <2376@wyse.wyse.com> stevew@wyse.UUCP (Steve Wilson xttemp dept303) writes: >In article <1513@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: >The Cydra-5 included hardware divide and square-root on the some >board .... > >I know that both operations sure did a number on scheduling the inner-most >loop. Both operations had a long latency, thus caused scheduling >headaches.> >How does the scientific computing community feel about this functionality? The Cydra-5's very long divide latency proved very harmful when running customer level application benchmarks. Divide may be "rare" but when it happens it happens often. Furthermore the very long (26ish) cycle latency was compounded by a decision to reuse some of the stages .. so it was 26cycles between initations ... as opposed to 1 or 2 cycles for most other operations. This resulted in compile times of HOURS for some simple loops ( / / / /) while the compiler tried to get a sensible schedule (for some reason, my suggestion of simply having acompiler directive to give up after a couple of minutes wasn't accepted). Both divide and sqrt crop up when one wants to be VERY careful about numerics ... as several really good algorithms rely on them ... there are quicker alogorithms for those applications, but usually less numerically robust. The Cydra 5 failed for primarily for business reasons; but there were some suboptimal technical decisions and I'd place 26 cycle II for divide on that list. Key applications went a good 10x slower; compile times went exponetially bad (though that was fixable). The cost of NOT having done a better job on this was quite large. I don't know if the desigers gave thought to giving up sqrt for a pipelined divide .... but it would have been a very good trade. While no one at ardent will 'fess up, I feel pretty confident in guessing that their next machine will have divide, or they will adopt a cray style "divide". Keith H. Bierman |*My thoughts are my own. !! kbierman@sun.com It's Not My Fault | MTS --Only my work belongs to Sun* I Voted for Bill & | Advanced Languages/Floating Point Group Opus | "When the going gets Weird .. the Weird turn PRO"