Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!apple!amdahl!sbf10 From: sbf10@uts.amdahl.com (Samuel Fuller) Newsgroups: comp.arch Subject: Re: More RISC vs. CISC wars Message-ID: Date: 13 Jul 89 02:40:03 GMT References: <42621@bbn.COM> <13985@lanl.gov> Reply-To: sbf10@amdahl.uts.amdahl.com (Samuel Fuller) Organization: Amdahl Corporation, Sunnyvale CA Lines: 58 In article <13985@lanl.gov> jlg@lanl.gov (Jim Giles) writes: >From article <42621@bbn.COM>, by slackey@bbn.com (Stan Lackey): >> [...] >> As I hope I clarified above, the pipeline allows a very long sequence >> of operations, including a memory access, to consume effectively one >> cycle of execution time. Specifically, memory-to-register floating >> point takes six cycles from front to back, but with the pipeline >> really consumes only one cycle. > >Or it really consumes six!! Depends upon whether there is anything >independent to do while this instruction runs. If the next instruction >depends on the result of this one, the next gets delayed six clocks. Period. If a RISC has data dependencies then its stuck too, right? > >With a RISC instruction set, you can move the individual components of >this complex "instruction" around to get maximum overlap from your pipeline. I hardly consider a memory-to-register multiply a complex instruction. For an example of a complex instruction see the TRT instruction in the IBM 370 POO. These are the instructions that RISC rightfully throws out. >Splitting the functionality of the instruction requires more instruction >issues, but it also allows better flexibility in instruction scheduling >optimizations. It would require a _very_ smart compiler to tell which >way to go. This is exactly one of the points I made originally about >CISCs being harder to compile for. Look at it this way. To perform a floating point multiply on two operands which exist in memory this machine will take two slots down the pipe to perform the operation. Prev Inst DATBXW LOAD OP1 to REG1 DATBXWload can be bypassed back into X for Mul Mult REG1 by OP(mem) DATBXW Multiply is finished after the X Next Inst DATBXW All RISC machines that I know about are Load/Store machines. So given the same pipeline they would take at least three slots to perform the operation. Prev Inst DATBXW LOAD OP1 to REG1 DATBXW LOAD OP2 to REG2 DATBXW Mult REG1 by REG2 DATBXW Multiply is finished after the X Next Inst DATBXW A pipeline is a pipeline. The pipelines on our 370 machines have a shorter cycle time than any RISC processor on the market. 370 is definitely not RISC. RISC is wonderful stuff. But it is not necessary to make a fast computer. RISC just allows you to make a fast computer quickly (read design time) and cheaply (read single chip CPU). Our machines are fast but they take forever to design and cost a fortune. But people buy them :). Sam Fuller / Amdahl System Performance Architecture