Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!zaphod.mps.ohio-state.edu!think.com!bbn.com!slackey From: slackey@bbn.com (Stan Lackey) Newsgroups: comp.arch Subject: Re: isamax and instruction set design Message-ID: <64761@bbn.BBN.COM> Date: 19 Jun 91 17:54:10 GMT References: <396@validgh.com> <1991Jun13.234834.22970@neon.Stanford.EDU> <1991Jun14.134338.4673@linus.mitre.org> <1991Jun14.163141.17728@rice.edu> <64739@bbn.BBN.COM> <677323133.AA763@flaccid> Sender: news@bbn.com Reply-To: slackey@BBN.COM (Stan Lackey) Organization: Bolt Beranek and Newman Inc., Cambridge MA Lines: 36 >>>>>> On 19 Jun 91 09:18:53 GMT, tonys@pyra.co.uk (Tony Shaughnessy) said: > >>I once worked on a machine that had instructions like x=min(y,z) in >>hardware. Also there were vector instructions that would elementwise >>compare vectors and build a vector of logicals which could be applied >>to a subsequent vector op. So a loop of the form >> >> where (a(i) .lt. b(i)) c(i) = d(i) + e(i) >> >>could be done with two vector instructions and no branches (other than >>loop control, of course). > >Tony> How many cycles would it take to run each of these instructions? >Tony> If it would take many cycles, then wouldn't this dominate any >Tony> expense incurred in taking a branch, or are there other reasons >Tony> for avoiding a branch? The min/max took one or two depending upon which version of the CPU you had. The interesting thing about the scalar machine was it was heavily pipelined, and used branch prediction. Unfortunately, branches which depend upon floating point values are often wrong 50% of the time, causing the pipe to be "drained" and restarted. In this case the min/max took the same amount of time as the compare by itself, so the branch followed by the assignment would be added on. The vector case took one cycle per element plus overhead of a few cycles per instruction, so the total time was 2*(#elements + overhead). This machine did not have conditional scalar instructions as others mentioned in this thread, but that is a good solution to this class of problem: instructons are just fetched and dropped down a pipe; "masked" instructions are just no-op'ed; the scalar pipeline is not disturbed. -Stan