Xref: utzoo comp.arch:6651 alt.next:170 Path: utzoo!hoptoad!pacbell!ames!amdahl!amdcad!crackle!tim From: tim@crackle.amd.com (Tim Olson) Newsgroups: comp.arch,alt.next Subject: Re: RISC v. CISC (was The NeXT problem) Message-ID: <23290@amdcad.AMD.COM> Date: 17 Oct 88 23:12:24 GMT References: <156@gloom.UUCP> Sender: news@amdcad.AMD.COM Reply-To: tim@crackle.amd.com (Tim Olson) Organization: Advanced Micro Devices, Inc. Sunnyvale CA Lines: 59 Summary: Expires: Sender: Followup-To: In article <156@gloom.UUCP> cory@gloom.UUCP (Cory Kempf) writes: | A while back, I was really hot on the idea of RISC. Then a friend | pointed out a few things that set me straight... I guess we are going to have to reset you straight, again! ;-) | First, there is no good reason that all of the cache and pipeline | enhancements cannot be put on to a CISC processor. If it is a microcoded processor, than the CISC machine will have to perform this pipelining at both the microinstruction and macroinstruction level, in order to be able to execute simple instructions in a single cycle. This costs more than if the micro and macro levels were the same (RISC). | Second, to perform a complex task, a RISC chip will need more | instructions than a CISC chip. This is true, although it is typically only 30% more from dynamic measurements, not the "3 to 5 times" that some people report. | Third, given the same level of technology for each (ie caches, pipelines, | etc), a microcode fetch is faster than a memory fetch. Also true. However, this only buys you anything if most of your instructions take multiple cycles. Unfortunately (?), most programs use simple instructions which should execute in a single cycle. If a CISC processor is to compete effectively, it must also be able to execute the most-used instructions in a single cycle. Therefore, it must also have the off-chip instruction bandwidth or on-chip cache bandwidth that RISC requires. With this requirement, it doesn't matter that microcode may be slightly faster than a cache access -- the cache is the limiting factor. | As an aside, the 68030 can do a 32 bit multiply in about (If I remember | correctly -- I don't have the book in front of me) 40 cycles. A while | back, I tried to write a 32 bit multiply macro that would take less | than the 40 or so that the '030 took. I didn't even come close (even | assuming lots of registers and a 32 bit word size (which the 6502 | doesn't have)). Most (if not all) RISCs address this by a) using existing floating-point multiply hardware (i.e. 32x32 multiplier array) for integer multiply (1 - 4 cycles) or b) having multiply sequencing or step operations that perform 1-2 bits at a time (16 - 40 cycles) so they are no slower than the current crop of CISC processors. In addition, if step operations are used, inexpensive "early-out" calculations will allow the average multiply time to drop quite a bit (because the distribution of runtime multiplies leans heavily towards multipliers of 8 bits or less). -- Tim Olson Advanced Micro Devices (tim@crackle.amd.com)