Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!apple!amdahl!sbf10
From: sbf10@uts.amdahl.com (Samuel Fuller)
Newsgroups: comp.arch
Subject: Re: More RISC vs. CISC wars
Message-ID: <d2iQ02ny3d5a01@amdahl.uts.amdahl.com>
Date: 13 Jul 89 02:40:03 GMT
References: <42621@bbn.COM> <13985@lanl.gov>
Reply-To: sbf10@amdahl.uts.amdahl.com (Samuel Fuller)
Organization: Amdahl Corporation, Sunnyvale CA
Lines: 58

In article <13985@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>From article <42621@bbn.COM>, by slackey@bbn.com (Stan Lackey):
>> [...]
>> As I hope I clarified above, the pipeline allows a very long sequence
>> of operations, including a memory access, to consume effectively one
>> cycle of execution time.  Specifically, memory-to-register floating
>> point takes six cycles from front to back, but with the pipeline
>> really consumes only one cycle.
>
>Or it really consumes six!!  Depends upon whether there is anything
>independent to do while this instruction runs.  If the next instruction
>depends on the result of this one, the next gets delayed six clocks. Period.

If a RISC has data dependencies then its stuck too, right?

>
>With a RISC instruction set, you can move the individual components of
>this complex "instruction" around to get maximum overlap from your pipeline.

I hardly consider a memory-to-register multiply a complex instruction.

For an example of a complex instruction see the TRT instruction in the
IBM 370 POO.  These are the instructions that RISC rightfully throws out.

>Splitting the functionality of the instruction requires more instruction
>issues, but it also allows better flexibility in instruction scheduling
>optimizations.  It would require a _very_ smart compiler to tell which
>way to go.  This is exactly one of the points I made originally about
>CISCs being harder to compile for.


Look at it this way.  To perform a floating point multiply on two
operands which exist in memory this machine will take two slots down the
pipe to perform the operation.

Prev Inst           DATBXW
LOAD OP1 to REG1     DATBXWload can be bypassed back into X for Mul
Mult REG1 by OP(mem)  DATBXW    Multiply is finished after the X
Next Inst              DATBXW

All RISC machines that I know about are Load/Store machines. So given
the same pipeline they would take at least three slots to perform the
operation.

Prev Inst           DATBXW
LOAD OP1 to REG1     DATBXW
LOAD OP2 to REG2      DATBXW
Mult REG1 by REG2      DATBXW    Multiply is finished after the X
Next Inst               DATBXW

A pipeline is a pipeline.  The pipelines on our 370 machines have a shorter
cycle time than any RISC processor on the market.  370 is definitely
not RISC.  RISC is wonderful stuff.  But it is not necessary to make a
fast computer.  RISC just allows you to make a fast computer quickly (read
design time) and cheaply (read single chip CPU).  Our machines are fast
but they take forever to design and cost a fortune. But people buy them :).

Sam Fuller / Amdahl System Performance Architecture