Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watmath!clyde!rutgers!ucla-cs!zen!ucbvax!hplabs!nsc!voder!apple!bcase
From: bcase@apple.UUCP
Newsgroups: comp.arch
Subject: Re: D-machine helped spawn RISC
Message-ID: <6281@apple.UUCP>
Date: Fri, 18-Sep-87 13:43:00 EDT
Article-I.D.: apple.6281
Posted: Fri Sep 18 13:43:00 1987
Date-Received: Sun, 20-Sep-87 02:10:13 EDT
References: <347@erc3ba.UUCP> <478@esunix.UUCP> <2785@ames.arpa> <6266@apple.UUCP> <600@rocky.STANFORD.EDU>
Reply-To: bcase@apple.UUCP (Brian Case)
Organization: Apple Computer Inc., Cupertino, USA
Lines: 63

In article <600@rocky.STANFORD.EDU> andy@rocky.UUCP (Andy Freeman) writes:
>Flynn used the same compiler/optimizer with different final code generators
>to study a number of different architectures.  (The compiler and optimizer
>were written under John Hennessy's direction a few years ago.  Yes, that
>Hennessy.)

John Hennessey certainly knows up from down when it comes to compilers.
But, in my humble opinion, many really important optimizations happen
*after* code generation; this is especially true for RISCs, I believe.
Looking at the output of modern, commercial "optimizing" compilers, I
am appalled at the code quality for certain cases.  Just because a text
book says that optimization occurs before code generation doesn't mean
that's the best way.

> All of the architectures had the same ALU; they differed in
>instruction format and register set architecture.  (They compared different
>register window schemes with monolithic register sets of various sizes.)
**>Since all of the tests used the same compiler and optimizer, much of the**
**>remaining differences were due to differences between the architectures.**

This is the claim I don't believe, not even a little bit.

>Remember,
>the critical path in MIPS, MIPS-X, and the Berkeley RISC processors is
>not in the control logic; I don't know about MIPS Co's product.

I'm not so sure that I believe this statement.  It is true that, in most
of the cases listed, little *area* was spent, but, at least for the
original Stanford MIPS, the master pipeline controller was a real
problem.  Remember, whether or not to "complexify" instructions set
definition is driven (or should be) by what software (compiler, OS)
wants/can deal with, not *only* by what hardware can stand.  The fact
that I can maintain cycle time even if I "complexify" the instruction
set does not mean it is the right thing to do!  What if the compiler
never emits those complex instructions?

>``From data traffic considerations, it seems that the [360-like CISC]
>with a register set of about size 16 plus a small data cache is preferable
>to multiple register sets for most area combinations.''

Again, I question the compiler effort here.

>Maybe instruction bandwidth isn't important, but data bandwidth seems
>to be.  As Flynn and company conclude, ``@i[Balanced optimization] is
>the key to overall instruction set efficiency.''  Let's see some data
>from RISC folks.

Bandwidth is not the only consideration:  LATENCY is often more important
where loads/stores are concerned (at least in machines, like RISC II, Am29000,
and I suspect SPARC that have a relatively low percentage of loads/stores).
High instruction bandwidth is very important for RISC machines; latency
is also important but there are techniques for dealing with this so that
it won't be so apparent at the chip boundary.  Techniques like interleaving
and using burst-mode memories (VDRAMS, SCDRAM, nibble-mode, etc.) can deal
with sequential bandwidth, but if it takes 2 milliseconds to get the first
word, who cares?  Latency, latency, latency.  Thus, arguements against RISC
founded on bandwidth requirements directed me, at least, will fall on deaf
ears.

About the only real data that I can offer is that the percentage of
loads/stores for stack-cache machines (RISC II, SPARC, Am29000, etc) is
often about 1/2 that observed in machines with only flat register files
(MIPS, etc.).