Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!zaphod.mps.ohio-state.edu!think.com!bbn.com!slackey
From: slackey@bbn.com (Stan Lackey)
Newsgroups: comp.arch
Subject: Re: isamax and instruction set design
Message-ID: <64761@bbn.BBN.COM>
Date: 19 Jun 91 17:54:10 GMT
References: <396@validgh.com> <1991Jun13.234834.22970@neon.Stanford.EDU> <1991Jun14.134338.4673@linus.mitre.org> <1991Jun14.163141.17728@rice.edu> <64739@bbn.BBN.COM> <677323133.AA763@flaccid> <MCCALPIN.91Jun19083815@pereland.cms.udel.edu>
Sender: news@bbn.com
Reply-To: slackey@BBN.COM (Stan Lackey)
Organization: Bolt Beranek and Newman Inc., Cambridge MA
Lines: 36

>>>>>> On 19 Jun 91 09:18:53 GMT, tonys@pyra.co.uk (Tony Shaughnessy) said:
>
>>I once worked on a machine that had instructions like x=min(y,z) in
>>hardware.  Also there were vector instructions that would elementwise
>>compare vectors and build a vector of logicals which could be applied
>>to a subsequent vector op.  So a loop of the form
>>
>>  where (a(i) .lt. b(i)) c(i) = d(i) + e(i)
>>
>>could be done with two vector instructions and no branches (other than
>>loop control, of course).
>
>Tony> How many cycles would it take to run each of these instructions?
>Tony> If it would take many cycles, then wouldn't this dominate any
>Tony> expense incurred in taking a branch, or are there other reasons
>Tony> for avoiding a branch?

The min/max took one or two depending upon which version of the CPU
you had.  The interesting thing about the scalar machine was it was
heavily pipelined, and used branch prediction.  Unfortunately,
branches which depend upon floating point values are often wrong 50%
of the time, causing the pipe to be "drained" and restarted.  In this
case the min/max took the same amount of time as the compare by
itself, so the branch followed by the assignment would be added on.

The vector case took one cycle per element plus overhead of a few
cycles per instruction, so the total time was 2*(#elements +
overhead).

This machine did not have conditional scalar instructions as others
mentioned in this thread, but that is a good solution to this class of
problem: instructons are just fetched and dropped down a pipe;
"masked" instructions are just no-op'ed; the scalar pipeline is not
disturbed.

-Stan