Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!ames!vsi1!altnet!uunet!portal!cup.portal.com!
From: bcase@cup.portal.com (Brian bcase Case)
Newsgroups: comp.arch
Subject: Re: CISCy RISC? RISCy CISC?
Message-ID: <10192@cup.portal.com>
Date: 19 Oct 88 18:59:16 GMT
References: <973@naucse.UUCP>
Organization: The Portal System (TM)
Lines: 97

>(I just know I'm going to regret this, but hey, it's late.)

Nah, we're as gentle as a spring rain in this group.  :-)

>Just what is it about RISC vs. CISC that really sets them apart?
>With my very naive understanding, it really seems that the big
>difference is that RISC models will let one get into the high
>speed technologies faster (which we really haven't seen out on
>the market place yet).  Other than that, I doubt I would care
>whether my machine is RISC or CISC, if I can even tell them apart.

You are right!  *Who cares* what it is, as long as it meets your
needs.  This is the old concept of the "High-level language machine"
about which Patterson wrote before the RISC I came out.  The point
is that a HLL machine can be perfectly well implemented at a VERY
low level as long as the user sees a HLL machine.

BUT, RISC is easier to make go fast than CISC.  Some, but only some,
of the advantage is that RISCs are starting fresh while most CISCs
must be backward compatible.  Even so, there are no new CISC designs
being done, that I know of.  THe point is, if you have a choice, you'd
be dumb to design a CISC instead of a RISC, at least with what we
know at this time.

The real difference:  Optimizing compilers can do a great job of
optimizing at a low level, but not at a higher level; i.e., RISC
instructions implement the right primitives, CISC implements groups
of operations at once thus preventing the compiler from breaking
them up so that the individual parts can be eliminated, factored out
of loops, reused, etc.  Now, given that the compiler wants simple,
composable primitives, we notice with glee that these are exactly the
things that can be implemented in a uniform pipeline! Wow!  *Synergy*
The whole is greater than the sum of the parts.  There is much more to
it, things like registers and exposed parallelism, but I think this
pretty much sums it up (if possible).

Would you buy a book called "Understanding RISC" if someone wrote it?
I hope so!

>A case in point.  I know of a not-yet-announced machine that has just
>about the largest instruction set I can imagine (not to mention the 15+
>addressing modes).  However, [it] has features that give RISC chips
>their performance - zillions registers, big I & D caches, etc., and most
>instructions down to 1 cycle per instruction.  result is a 12.5MHz machine
>that runs 25000 (claimed) dhrystones using what I would call a 'throwaway'
>C compiler.  The manufacturers can push the clock to 30MHz, which would give
>>40000 dhrystones.

Well, if it's true it's true.  If dhrystones scale with clock rate on this
machine that would give 60K dhrystones.  This is not bad for 30 MHz.  2000
dhrystones per MHz is better than the typical 1100-1500 per.  My guess,
though, is that most of the instructions in the set do not contribute
commensurate with their implementation cost.  Studies have shown again and
again that it's hard to beat an instruction set with load, store, add, cmp,
delayed branches (maybe compare-and-branch if that fits in the pipe), and
call, at least for systems code.  Very few other instructions, but some,
contribute more than 1% to *overall* performance.  Did these guys really
study the frequencies of execution and total time taken?  Perhaps they
did.  I think I know who you are talking about, but I can't say anything.
Would this processor come from Germany?  And many RISCs and re-implemented
CISCs will be at more than 30 MHz soon (?).

>...but these seem to me to be very fine numbers, at least compared with the
>uVAXen I've played with.  How do they compare to current RISCs?  I'd bet
>pretty much the same.

Yup, pretty much the same.  I suspect they implemented the simple instructions
as a RISC would, and *that* is the reason that its performance is good, not
the existence of all the other instructions.  But there is always room for
a discovery or two.

>When the really fast chips come in, I bet the RISC machines are the first
>to come out, but still, is there something that will keep CISC from
>catching up?

There are many techniques that can be applied to both RISC and CISC
machines to make them fast.  However, some of them are MUCH more difficult
to implement when the length and format of instructions isn't known a priori
and when instructions can have multiple effects and require multiple cycles.
Thus, the CISC guys will initially implement the simple, RISC-like subsets
of their instruction sets using a uniform pipeline.  The other instructions
will run at the old speeds or a little better.  But, these techniques don't
fix things like two-address instructions vs. three-address instructions,
they can't add more registers (although I know someone is going to do that
to their machine, but it is not simple!), and they can't add some of the
exposed parallelism that RISCs can have.  Or a CISC can have a decoded-
instruction cache, but this can add latency when the next needed instruction
isn't in the decoded instruction cache, which is much smaller (since the
decoded instructions are much bigger than the encoded ones) than a regular
instruction cache.

The point is that RISCs are probably going to be smaller, faster, and
cheaper than CISCs, and implementing out-of-order execution and pattern
matching to allow multiple instructions per cycle will probably be much
easier.

Stay tuned....  We'll all see how this turns out!