Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!wuarchive!udel!nigel.ee.udel.edu!mccalpin
From: mccalpin@perelandra.cms.udel.edu (John D. McCalpin)
Newsgroups: comp.arch
Subject: Re: RISC vs. CISC -- SPECmarks
Message-ID: <MCCALPIN.91May7092645@pereland.cms.udel.edu>
Date: 7 May 91 13:26:45 GMT
References: <3423@charon.cwi.nl> <11602@mentor.cc.purdue.edu>
	<1991Apr30.163153.18568@midway.uchicago.edu>
	<1991May2.162909.9165@news.arc.nasa.gov> <819@cadlab.sublink.ORG>
	<1991May7.061500.7485@marlin.jcu.edu.au>
Sender: usenet@ee.udel.edu
Organization: College of Marine Studies, U. Del.
Lines: 52
Nntp-Posting-Host: perelandra.cms.udel.edu
In-reply-to: csrdh@marlin.jcu.edu.au's message of 7 May 91 06:15:00 GMT

>> On 7 May 91 06:15:00 GMT, csrdh@marlin.jcu.edu.au (Rowan Hughes) said:

Rowan> I'm a little puzzled by the discussions involving vector vs.
Rowan> risc s-scalar.  Given similar hardware, and an appropriate
Rowan> (vectorizable) algorithm the vector method should always be
Rowan> much faster. 

I am not sure exactly what question you are implying in this
statement.  If you are just saying that a vectorizable algorithm will
run faster if you vectorize it, then I agree.  However, it is often
the case that non-vectorizable algorithms on fast scalar machines can
outperform a vector algorithm for the same problem on a vector machine
of similar technology.  

The primary difference is usually computational complexity --- for
example, Gaussian elimination for tridiagonal matrices requires O(N)
work and is not vectorizable, while Cyclic reduction requires O(NlogN)
work and is vectorizable.  The relative performance of the algorithms
is thus a balance between the extra work required by the vector
algorithm and the extra performance of the vector hardware.

A secondary difference concerns memory bandwidth.  Most of the
machines that we have been discussing have insufficient memory
bandwidth to run long vector operations at full speed.  Thus,
algorithms that avoid excess memory accesses (like the inner product
algorithm for matrix multiplies) will run faster than an algorithm of
the same computational complexity that uses a standard "vector"
approach (like the SAXPY in the inner loop of the outer product
algorithm for matrix multiplies).

Rowan> Risc s-scalar machines are still essentially SISD.
Rowan> Also is a true vector machine using risc harware likely to
Rowan> emerge soon?  Hope my ignorance isnt too obvious.

Vector instructions are also essentially SISD at the hardware level.
When you execute a vector instruction on a vector machine, it is doing
(almost) exactly the same thing as a Killer Micro running a tight loop
feeding data into a pipelined FPU.  

To extend a Killer Micro to the functionality of a Cray Y/MP will
still require quite a bit of work.  The ability of the Cray to handle
2 vector loads, one vector store, one vector add, and one vector
multiply simultaneously does not easily fit into the RISC paradigm,
since it depends on the existence of multi-cycle instructions.
Superscalar does not seem exactly the way to go, unless the load-store
units are made independent of the integer and float units.  To
reproduce the functionality of the Cray Y/MP seems closer to VLIW than
most of the RISC approaches in use now....
--
John D. McCalpin			mccalpin@perelandra.cms.udel.edu
Assistant Professor			mccalpin@brahms.udel.edu
College of Marine Studies, U. Del.	J.MCCALPIN/OMNET