Xref: utzoo comp.arch:6813 alt.next:239
Path: utzoo!hoptoad!pacbell!amdahl!pyramid!prls!mips!mash
From: mash@mips.COM (John Mashey)
Newsgroups: comp.arch,alt.next
Subject: Re: RISC v. CISC (really comments on many postings: LONG)
Message-ID: <6865@winchester.mips.COM>
Date: 25 Oct 88 08:11:01 GMT
References: <156@gloom.UUCP>
Reply-To: mash@winchester.UUCP (John Mashey)
Organization: MIPS Computer Systems, Sunnyvale, CA
Lines: 536

ARGH! I'm away for a week and comp.arch goes crazy! :-)
Rather than try to multiple post on the hordes of RISC-CISC stuff,
I've glommed them together:

>Article 6913 of comp.arch:
>From: cory@gloom.UUCP (Cory Kempf)

>A while back, I was really hot on the idea of RISC.  Then a friend 
>pointed out a few things that set me straight...

[1] At least one of the things is pretty misleading:

>First, there is no good reason that all of the cache and pipeline
>enhancements cannot be put on to a CISC processor.

Cache and pipeline enhancements help CISCs also;  people are always
working on making CISCs go faster by better (deeper) pipelining and caching.
(The literature has plenty of examples of the efforts of CISC implementors.
to make their existing architectures go faster.)

However, there are some FUNDAMENTAL ways that most popular CISC architectures
differ from the higher-performance RISCs.  Here are a few, and why they
matter:
			CISCs			RISCs
EFFICIENT (DEEP)	possible, but		designed for this
PIPELINE		expensive in hardware
			and/or design time

	example:	variable-size instrs,	32-bit instrs
			sequential decode (VAX)	
			complex side-effects	at most simple side-effects
	example:	conditional branches	delayed-branches
			(tricky, much hardware)

SEPARATE I&D cache	maybe, but sometimes	usually, and don't support
			must support old code	store-into-instr stream w/o
			that does store-into-	explicit info
			instr-stream (this is
			a royal pain, since you pay hardware in the fastest
			part of the machine for something that seldom happens.
			(Yes, I was bad too: a popular S/360 program I wrote
			almost 20 years ago used this "feature".  sigh.)

	examples:	comparators in Amdahls watching for I-stream stores

ADDRESSING MODES	can be very complex,	usually just load/store with
			including side-effects	at most indexing & auto- +/-
			and page-crossings	with no page-crossings
	examples:	VAX; new modes in 68020
			Note that complex addressing modes can interact horribly
			with deep pipelining, because the very thigns you want
			to do to make it go fast add complexity and/or state
			in the fastest parts of the machines.

DESIGNED FOR		maybe, maybe not	yes
OPTIMZERS
	examples:	registers either	32 or more GP registers
			insufficient, or	available for allocation
			split up in odd	
			ways.
	example:	When you count general-purpose regs available for
			general allocation, a 386 gives you about 5-6,
			I think, compared to maybe 26-28 on an R3000,
			SPARC, HP PA, etc.  No amount of caching and
			pipelining makes 5 look like 26 to an optimizer.
			(This is not to say a good optimizer won't HELP,
			and in fact, Prime bought our compilers because it
			will help them; it just doesn't help as much.)

EXPOSED PIPELINE	usually not		usually some
	example:	It helps to reorganize code on CISCs (like S/360s)
			to cover load latencies, and spread settings of
			condition codes apart from the branch-conditions
			(on some models), but RISCs usually cater to these.
			Note that machines with complex address-modes built
			into the instructions are hard to do this with,
			i.e., the compilers can't easily split instructions with
			memory-indirect loads, for example, to get a smoother
			pipeline flow.

EXCEPTION-HANDLING	can get complex		relatively simpler
	example:	Exception-handling in heavily-pipelined CISCs
			not designed for that can either get very tricky,
			take a while to design and get right, or burn
			a lot of hardware, or all 3.

Note that hardware complexity is especially an issue in VLSI:
it is relatively easy to get dense regular structures on a given
amount of silicon (registers, MMUs, caches), but complex logic burns
it up fast, and routing can get tricky.

These are a few of the salient areas that illustrate a common
principle: there's hardly anything you couldn't do [except perhaps
cleanly increase the simultaneously-available registers] that you
can do in a RISC that you can't also do in a CISC.
HOWEVER, IT MAY TAKE YOU SO LONG TO GET IT RIGHT, OR COST YOU SO MUCH,
THAT IT DOESN"T MAKE COMPETITIVE SENSE TO DO IT!!!!
More than one large, competent computer company has discovered this
fact, which is why you often see multiprocessors being popular at
certain times, i.e., because it's easier to gang them together than
it is to make them go faster.
The problems often show up in 2 places:
	bus interface (including MMU)
	exception-handling
I'm sure any OS person out there who's dealt with early samples of
32-bit micros still has nightmares over some of these [How about
some postings on the chip bugs you remember worst!  I'll start with one:
UNIX always seems to find these $@!% things, which somehow have slipped
thru diags. Our 1973 PDP 11/45 had a bug which was only seen on UNIX,
because it used the MMU differently than DEC did, and the C compiler
often used some side-effectful addressing mode that DEC didn't often:
as I recall, if you accessed the stack with a particular sequence,
and a page boundary got crossed, and a trap resulted, something bad
happened.]

Making CISCs go faster is an interesting and worthy art in its own
right, and is certainly a good idea for anybody with a serious
installed base.  However, it does get hard: one of the architects
of a popular CISC system once told me that making it go much faster
(other than with circuit speedups) seemed beyond human comprehensibility
to do in a reasonable timeframe.

>Article 6914 of comp.arch:
>Subject: Re: RISC v. CISC
>Reply-To: rang@cpswh.cps.msu.edu (Anton Rang)

>In article <156@gloom.uucp>, Cory Kempf (decvax!encore!gloom!cory) writes:
>>First, there is no good reason that all of the cache and pipeline....

>  This is definitely true.  Look at the caching on the 68030, or the
>Z80,000 for instance.  The advantage a RISC gives you is more space
>for caching logic, though--so you can have a bigger cache (or more
>registers, or possibly both).

[2] Again, there is no good reason not to uses caches, but there are good
reasons why deeper CISC pipelines sometimes get very expensive.

Re: Z80,000: is that a real chip?  [Real = actually shipping to people
in at least large sample quantities; would be nice to see UNIX running, etc].
Note: you can find magazine articles describing it in detail, as though
it were imminently available....the problem is, some of those articles
are now 4 years old...If it doesn't really exist as a product, how can
it be cited as an example to prove anything? (If it is really out there
in use, please post some more to that effect and this comment will go away.)

>Article 6918 of comp.arch:
>From: baum@Apple.COM (Allen J. Baum)
>Subject: Re: RISC v. CISC --more misconceptions

[3] (Allen properly replies to many of the original misconceptions,
omitting only the discussion in [1] above on difficulty of deep pipelining
on some CISCs.)

>Article 6920 of comp.arch:
>From: sbw@naucse.UUCP (Steve Wampler)
>Subject: CISCy RISC? RISCy CISC?

>Just what is it about RISC vs. CISC that really sets them apart?
>... Other than that, I doubt I would care
>whether my machine is RISC or CISC, if I can even tell them apart.

[4] ABSOLUTELY RIGHT!  Most people should care less whether it's RISC
or CISC, just whether it does the job needed, goes fast, and is cheap.

>A case in point.  I know of a not-yet-announced machine (perhaps
>never to be announced machine) that has just about the largest
>instruction set I can imagine (not to mention the 15+ addressing
>modes)....
>The result is a 12.5MHz machine that runs 25000 (claimed)
>dhrystones using what I would call a 'throwaway' C compiler....

As you note, not-yet-announced.  On the other hand, MIPS R3000s
do 42K Dhrystones, and they're already in real machines, and vendors
are quoting the CPUs at $10/mip, i.e., $200 for 25MHz parts.

>Now, I've missed most of the RISC/CISC wars, but these seem to
>me to be very fine numbers, at least compared with the
>uVAXen I've played with (all of which cost more).
But uVAXen are real...
>How do they compare to current RISCs?  I'd bet pretty much the same.
>I personally couldn't care which machine I'd own (not that I can
>afford any).  When the really fast chips come in, I bet the RISC
>machines are the first to come out, but still, is there something
>that will keep CISC from catching up?
See the discussion in [1] above.  ALso, note, in a time when the
design cycle is 12-18 months, and people double performance in that
period, being that far behind means a factor of 2X in performance....

>Article 6936 of comp.arch:
>From: daveh@cbmvax.UUCP (Dave Haynie)
>>>It seems that the NeXT machine may have a few problems:

>>>1) Outdated Processor Technology: NeXT just missed the wave of fast RISC 
>>>   processors.  The 5 MIPS 68030 is completely out performed by the currently
>>>   available RISC chips (Motorola, MIPS, Sparc) that run at approximately
>>>   20 VAX (they claim) MIPS.  In a year or two, ECL versions of some of these
>>>   RISC chips will be running at 40 to 50 MIPS.

>Priced the ~8 MIPS Sun 4 lately?  Or the ~14 MIPS 88K chipset.  How about
>an Apollo 10K?  RISC machines are starting to get fast, and they're even
>starting to get down in price, but these two directions haven't met yet.

[5] Actually, this is the wrong reason: you can put together MIPS chipsets
at similar (or even slightly better) cost/performance levels (have you
prices a 68882 lately, for example?)  However, be fair to Jobs & co:
when they started, none of the RISC chips was generally available;
some of them [88K] are not yet generally available in volume.  try drawing
a timeline sometime of a) when you get first specs on a chip, b) when
you can design it in c) when you can make enough to get the software act
together d) when you can actually ship in volume.  IT TAKES A WHILE!
(I've commented earlier on ECL desktop hoevercrafts.)

Also, betting on a new architecture at the beginning of a cycle [i.e.,
in the Z8000/68K/X86/X32 etc wars in the early 80s, and the current
BRAWL (Big RISC Architecture War & Lunacy) is very exciting, and probably
not something a startup should do.  Consider, choosing an architecture
is like an odd form of Russian Roulette: you pick a chip and pull the
trigger, then wait a year or two to see if you've blown your brains out.
(An awful lot of workstation startups picked wrong the last time, and
they're gone, for example.)
Fortunately, the BRAWL will be over before the end of the year,
which will make life saner.

>Since the VAST majority of Suns sold to universities are Sun 3s (68020 based)
>and below (believe it or not, folks STILL use Sun 2s here and there), I don't
>think a 68030 based system, even NeXT's, which isn't an especially fast 68030
>system (they're running it's memory at about 1/2 the possible speed), will have
>no trouble competing with the installed 68020 systems.  Or a $25,000-$50,000 

>RISC based workstation.
I still think there's nothing wrong with NeXT using a 68030; there will however
be both SPARC and MIPS-based workstations a lot cheaper than $25-50K,
in volume, by the time the NeXT boxes are out in volume.

>Article 6964 of comp.arch:
>From: wkk@wayback.UUCP (W.Kapalow)
>Subject: RISC realities
>
>I have used, programmed, and evalutated most of the current crop of
>RISC chipsets.....

[6]....some reasonable analysis, from somebody with fewer axes to grind
than most of us, thank goodness!

>Chips like the Amd29000 are trying to make things better by having
>an onboard branch-target cache and blockmode instruction fetches.  Try
>getting 1-2 cycles/instruction with a R2000 with dynamic memory and no
>cache, the 29000 does much better.

Yep, although R3000s with some of the new cache-chip variants
will get to be an interesting fight here, i.e., since the R2000/R3000
has all ofthe cache control on-chip, and there are new cheap, small
FIFO parts that eliminate the write buffers.

>....  Look at the AT&T CRISP processor,  ....
Worth doing: some interesting ideas, regardless of commercial issues.

>Article 6968 of comp.arch:
>From: peter@ficc.uu.net (Peter da Silva)
>Subject: RISC/CISC and the wheel of life.

>I have noticed one very interesting thing about RISCs lately... they are
>getting quite sophisticated instruction sets. 3-address operations and
>addressing modes aren't what I used to associate with RIS, but if you look
>at them they turn out to be refinements of older RISCs.

[7]  This is very confusing.  Most RISCs use 3-address operations, i.e.,
	reg3 = reg1 OP reg2.
			rather than just 2-address ops:
	reg1 = reg1 OP reg2

Certainly, these include, but are not limited to: IBM 801, HP PA,
MIPS R2000, SPARC, 29K, 88K.

>What's happening, of course, is that the chips are so much faster than any
>sort of affordable RAM that it's worthwhile to put more into the instructions.
>The speed of the system as a whole goes up, since the chip can still handle
>all three register references in one external clock. No point in fetching
>instructions any faster than that...

I think this obfuscates the issue.  Any reasonable design has a register
file that has at least 2 read-ports and 1 write-port, i.e., can do
2 register reads and 1 write per cycle.  BOTH 3-address and 2-address
forms need to do those 2 reads & 1 write; the only difference is that
the 2-address form allows a denser instruction encoding, but the
base hardware is rather similar.

>Article 6970 of comp.arch:
>From: guy@auspex.UUCP (Guy Harris)
>Subject: Re: The NeXT Problem

[8]....Guy gives some reasonable comments....

>>Not trying to start a flame war, but 030's are faster than Sun 4's.

>To which '030 machine, and to which Sun-4, are you referring?  At the
>time the Sun-4/260 came out, no available '030 machine was faster
>because there weren't any '030 machines.....
>Also, you might compare '030s against MIPS-based machines; are they
>faster than them, as well?
No.

>>I puke trying to write assembly on RISC machines.

>Fortunately, I rarely had to do so, and these days fewer and fewer
>people writing applications have to do so.
>These days, "ease of writing assembly code" is less and less of a
>figure of merit for an instruction set.

100% agree; however, most people who've used our RISCs think they're
easier to deal with in assembler anyway, although they observe there's
less oppurtunity for writing unbearably obscure/clever code....

>Article 6974 of comp.arch:
>From: daveh@cbmvax.UUCP (Dave Haynie)
>Subject: Re: "Compatible" (was Re: The NeXT Problem)

>> In article <5941@winchester.mips.COM> John Mashey writes:
>>> This defies all logic.
>>> a) If it's compatible with an 030, it's not a RISC.
>
>> I agree with John, completely.
>
>For an example of an architecture that's 68000 compatible and RISCy to
>the point of executing most instructions in a single clock cycle, look
>no farther than the Edge computer.  However, if you want this on a 
>single chip, instead of a bunch of gate arrays, you'll have to wait.

This gets back to the point in [1]: you can throw an immense pile of
hardware and design time at an architecture to make it go faster,
but that doesn't make it a RISC architecture.  Maybe it makes it
a RISCier, or less CISCier implementation implementation [which is
what I meant when I said the 030 was RISCier than 020, which caused
a lot of confusion.  sorry.]  Another example is the way that
the MicroVAX chipset is a RISCier implementation of a VAX (and this
is more true than the Edge example, i.e., the MicroVAX gets by with
less hardware by moving some of the less frequent ops to software.)

>> the MC680X0's instruction set would NOT be a RISC instruction set.
>....  Consider that most
>of the RISCy CPUs on the market have been done as little baby chips,
>by ASIC houses (SPARC, MIPS).

Wrong.  the first SPARCs were gate arrays, but the
Cypress SPARCs are coming.  MIPS chips have NEVER been done
in ASICs, although LSI Logic is working on ASIC cores of them.
In our case, the CPU+MMU is about 100K transistors, which is
NOT as large as a 386 or 030, but not a "little baby chip" either.
AMD 29Ks are definitely not little baby chips either< and they're real, too.

>Article 6975 of comp.arch:
>From: daveh@cbmvax.UUCP (Dave Haynie)
>Subject: Re: The NeXT Problem

(AMD 29K prices from Tim Olson).
>> 	16MHz	$174
>> 	20MHz	$230
>> 	25MHz	$349
>
>> I'm sure that LSI Logic could also show you very low prices on their
>> RISC chips.  Last I heard, the 68030 was in the $300+ price range.

>Alot of it depends on quantity.  I'm sure NeXT and Apple are buying their
>68030s more that 100 at a time.  Many of the ASIC houses making RISCs are
>output limited.  And with most of the RISC designs, once you pay the 
>additional cost of caches and MMUs, you're way out of the 68030 league,
>cost wise.  Complete systems I've seen with both MIPS and 88k put you
>at around $1000 for the CPU subsystem.

All of these depend on quantity, and what it is you're trying to build.
Admittedly, it's hard for us to build anything less than about 6 VUPs.
I suspect you can build a CPU (+ FPU) subsystem like that for around $500,
given large quantities, maybe $400-$500 as the new cache chips come out.

>Article 6977 of comp.arch:
>From: jsp@b.gp.cs.cmu.edu (John Pieper)

>Actually, I heard a guy from Motorola talking about their n+1st generation
>680X0 machine -- they run an internal clock at 2X the external clock, and
>play some other tricks to get 14 MIPS effective, 25 MIPS max @ 25 MHz. Seems
>to me that CISC designers could do this very effectively to get ahead of the
>RISC types (modulo the design time).

[10] But remember that existing RISCs, shipping now, get 20 MIPS @ 25 MHz,
so it's hard to see how that's getting CISCs ahead. [It still is perfectly
reasonable to do, i.e., a 68040.  Plenty will get sold.]

>BTW, as far as design time goes, you have to take the RISC argument with a
>grain of salt. the 68030 is only a little different that the 68020, but with
>technology advances and just a few man-years they more than tripled the
>speed of the initial 68020 release (in 82?). The 68040 will take the same
>basic ALU design, and add the FPU. This shouldn't require too much redesign.
>The point is that a good CISC design can be modified (added to) as quickly
>as a major redesign of a RISC chip. What really counts is who can sell their
>instruction set.

Starting from scratch in 1984, and getting the first systems in mid-1986,
the high-performance VLSI RISC  [i.e., MIPS as example] is:
	1986	5 MIPS
	1987	10 MIPS
	1988	20 MIPS

But the last comment is really right: what really counts is who sells
the instruction set.  That's why the battle is pretty ferocious over
who gets to be the RISC standard (or standards), because everybody
knows there can only be a few, at most.

>Article 6987 of comp.arch:
>From: rsexton@uceng.UC.EDU (robert sexton)
>Subject: Re: The NeXT Problem

>While RISC may be cheaper(smaller design, less silicon) what you are really
>doing is shifting the cost burden onto the rest of the system.  The high
>memory bandwidth of the RISC design means more high speed memory, bigger
>high-speed caches.  With a CISC design, you put all of the high speed silicon
>on one chip, lowering the cost of all the support circuitry and memory.

[11] This is not a reasonable conclusion.  You can put caches on-chip
in either case.  A fast machine, in either case, will need a lot of
memory bandwidth: observe, for example, that the data-bandwidth should
be about the same for both.  Finally, note that people are generally
adding external caches to X86s and 68Ks to push the performance up,
for all the same reasons as RISCs. 

>Article 7005 of comp.arch:
>From: phil@diablo.amd.com (Phil Ngai)
>Subject: Re: RISC realities

[12]....reasonable discussion about burst mode I-fetchs, VRAMS, etc.

>I don't think the R2000 or the Mc88000 support this, but that's not
>an inherent limitation of RISC architectures.

Nope, we don't do this, or at least not exactly.  R3000s support
"instruction-streaming", whereby when you have an I-cache miss,
you do multi-word refill into the cache, but you execute the relevant
instructions, as they go by.  Typical designs use page-mode DRAM access.
Note, of course, that in the next rounds of design across the industry,
where almost everybody goes for on-chip I-cache with burst-mode refill
(I.e., 486; >= 68030, etc), the distinction disappears.

>Article 7013 of comp.arch:
>From: malcolm@Apple.COM (Malcolm Slaney)
>Subject: Re: CISCy RISC? RISCy CISC?
>P.S.  An interesting question is whether Symbolics/TI/LMI will fail because 
>the market is to small to support a processor designed for Lisp and GC or
>because CISC's are a mistake.

[13] The evidence so far is that neither reason is the most likely reason for
potential failure.  The more general reason is that special-purpose
processors that don't get real serious volume get hurt sooner or later,
for one of several reasons:
	a) A more general part ends up getting more volume, which keeps
		costs down.
	b) It's hard to stay on the technology curve without the volume.

>Article 7033 of comp.arch:
>From: eric@snark.UUCP (Eric S. Raymond)
>Subject: Re: RISC/CISC and the wheel of life.

>My understanding of RISC philosophy suggests that 3-address ops and fancy
>addressing modes are only regarded as *symptoms* of the RISC problem -- poor
>match of instructions to compiler code generator capabilities, excessive
>miceocode-interpretation overhead in both cycles and chip real estate.
>
>If your compiler can make effective use of three-address instructions, and
>you've got CAD tools smart enough to gen logic for them onto an acceptably
>small % of the chip area (so that you don't have to give up on more important
>features like a big windowed register file and on-chip MMU), then I don't see
>any problem with calling the result a RISC.

[14] As noted in [7] above, 3-address instructions are NATURAL matches
to typical register-file designs; people shouldn't be assuming that
there is some big cost to having them (in terms of logic complexity).

>Article 7040 of comp.arch:
>From: doug@edge.UUCP (Doug Pardee)
>Subject: Re: CISCy RISC? RISCy CISC?
>Organization: Edge Computer Corporation, Scottsdale, AZ

>The incorrect assumption here is that you would want to build a mainframe
>using RISC technology -- that RISC technology has anything to offer at
>that price/cost level.
Well, M/2000s act like 5860s, and we think next year's M/xxxx will
make 5990s sweat some.  Why wouldn't we want to build RISC-based mainframes?
Lots of people do.

>As we at Edgcore have shown, it is both possible and practical to implement
>CISC instruction sets at speeds faster than RISC.  But -- it doesn't all fit
>on one chip.  Yet.

Could you cite some benchmarks for the newest machines?  [I don't believe
that the current production ones are faster than 25MHz R3000s, but I could
be convinced.]

>In a mainframe design, who cares if it fits on one chip?  Jeez, in our E2000
>system we need an entire triple-high VME card jam-packed with surface-mount
>parts just to hold the *caches* that we need to have to keep from starving
>the CPU.  The complexity and board area of the CPU itself is insignificant
>compared to that required by mainframe-sized multi-level memory systems.

I sort-of agree, in the sense that if you're building a physically
large/expensive box anyway, then the CPU is a small piece of the action.
On the other hand:
People who want to put mainframe (CPU performance) on desktop/deskside
systems care; weirdly enough, a whole lot of people expect to do this.

How big are the caches?  It does surprise me they're a whole big VME card,
unless they're absolutely immense.  We get 20-VUPS performance with 128K
cache, which fits with the CPU+FPU+write buffers on about a 6" x 6" square.

>Article 7041 of comp.arch:
>From: pardo@june.cs.washington.edu (David Keppel)
>Subject: Re: LISPMs not RISC? - Re: CISCy RISC? RISCy CISC?

>Oh, heck, there's some (relatively) new supercomputer being produced
>by some subsidiary of CDC (I think?) that was written up in "digital

[16] ETA is the reference.  One could argue about whether to call
it CISC or RISC, depending on what you generally think vector machines
really are.

>Also, while CISC is out of vogue in new industry designs at the
>moment, there are plenty of Universities building microcoded
>processors (read "CISC"?).

Of course, this proves little about commercial reality [that is not good
or bad; it is not the job of universities to do that.], but quite a few
folks think there is more to RISC than "being in vogue".

Whew!
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086