Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!olivea!genie!udel!nigel.ee.udel.edu!mccalpin From: mccalpin@perelandra.cms.udel.edu (John D. McCalpin) Newsgroups: comp.arch Subject: Re: isamax and instruction set design Message-ID: Date: 19 Jun 91 12:38:15 GMT References: <396@validgh.com> <1991Jun13.234834.22970@neon.Stanford.EDU> <1991Jun14.134338.4673@linus.mitre.org> <1991Jun14.163141.17728@rice.edu> <64739@bbn.BBN.COM> <677323133.AA763@flaccid> Sender: usenet@ee.udel.edu Organization: College of Marine Studies, U. Del. Lines: 45 Nntp-Posting-Host: perelandra.cms.udel.edu In-reply-to: tonys@pyra.co.uk's message of 19 Jun 91 09:18:53 GMT >>>>> On 19 Jun 91 09:18:53 GMT, tonys@pyra.co.uk (Tony Shaughnessy) said: Tony> In article <64739@bbn.BBN.COM> slackey@BBN.COM (Stan Lackey) writes: > >I once worked on a machine that had instructions like x=min(y,z) in >hardware. Also there were vector instructions that would elementwise >compare vectors and build a vector of logicals which could be applied >to a subsequent vector op. So a loop of the form > > where (a(i) .lt. b(i)) c(i) = d(i) + e(i) > >could be done with two vector instructions and no branches (other than >loop control, of course). > >-Stan Tony> How many cycles would it take to run each of these instructions? Tony> If it would take many cycles, then wouldn't this dominate any Tony> expense incurred in taking a branch, or are there other reasons Tony> for avoiding a branch? On the Cyber 205/ETA-10, the second instruction requires one cycle per element, whether the result is stored or not. I think that the first instruction also requires one cycle per element. Then add two vector startups to the total time required. So the cycle count is something like (on the Cyber 205): WHERE block: 140 + 2*N original op: 70 + N (No IF tests -- compute at all locations) For long vectors, if a(i)