Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!uwvax!dogie.macc.wisc.edu!csd4.milw.wisc.edu!lll-winken!uunet!portal!cup.portal.com!bcase
From: bcase@cup.portal.com (Brian bcase Case)
Newsgroups: comp.arch
Subject: Re: 80486 vs. 68040 code size [really: how many regs]
Message-ID: <18549@cup.portal.com>
Date: 19 May 89 17:59:00 GMT
References: <950@aber-cs.UUCP> <25651@amdcad.AMD.COM> <4228@ficc.uu.net>
Organization: The Portal System (TM)
Lines: 80

>How about some facts on the 32bit version of the NOVIX, Phil Koopman's
>WISC chip, and Johns Hopkins Labs stack oriented chip.  All of these
>chips were faster (more MIPS per Mhz) than the ...(680xx, 80x86) in '87 -
>using fairly old technology.

Not hard to do.  The same (old technology) could be said of the MIPS and
SPARC implementations of the time.

>How fast might they be if they had a sustained development 
>effort on the order required to produce the 29000 and the 88000?

I give up, how fast?

>Did the big name chip developers miss something here?  Why didn't any
>of them develop a dual (4?) stack chip, zero (ok 1 or 2) addressing 
>modes, harvard architecure (3 data paths), 16 (or 32) intructions that 
>were essentially the chips micro-code (instruction bits fed directly 
>into the control lines on chip, very little decode time).  All of these 
>chips could do a call/branch in 1 cycle, return in 0 cycles, and passed 
>parameters lived on the stack with everything else.  Usually stacks 
>were cached on chip with overflow to memory.

Except for the 0-cycle return, most RISC chips share the same attributes.
However, have you considered the fact that the implemenations of commercial
RISCs are constrained by, for example, virtual memory?  Or the need to
support many different kinds of languages?  Having 3 or 4 memory ports is
CLEARLY a great idea if it fits in with the rest of the system design
center.  For Forth, a-ok.  For a chip that's going to have TLB(s) on chip,
not OK.  To answer your questions:

None of the commercial RISCs have stack architectures because such an
architecture defeats optimization strategies and doesn't make use of the
inherenently powerful (high-bandwidth, low-latency) chunk of hardware called
the 3-port (or ever more ports) register file.

All commercial RISC chips have 0 or 1 addressing mode.  (This is using my
definition of RISC.)

Most commercial RISCs have a Harvard implementation with a separate path
for instructions and data.  Implementing more 32-bit paths would have bad
consequences like slowing them all down.

While decoding a NOVIX instruction is simpler than that of most commercial
RISC instructions, the difference is not important.  What is important is
that the pipeline consisting of fetch, decode, execute, and write back (maybe
a memory stage in there too, like in MIPS) be uniform, that is, all stages
requiring essentially the same number of gate delays.  Greatly simplifying
the decoding beyond what has already been done is not productive for these
architectures.  (Analogy:  RISCs have fast procedure call mechanisms.
Speeding up the procedure call by another factor of 10 would have little
consequence.)

Commercial RISCs have 1 cycle branches and 1 cycle returns (which is really
a branch).  It might have been possible to architect and implement an
indirect branch (i.e. return) instruction that also specifies a (2-address)
arithmetic op, similar to what the NOVIX, et. al. has.  It would be somewhat
complex and irregular.  Scheduling the op in the delay slot is much more
general and doesn't make the ALU op different from all the rest (2-address
vs. 3-address).  The win would be probably be small in these architectures.

Commerical RISCs also cache the "stack" and everything else, on chip in the
general-purpose 3-port register file.  A processor has a memory hierarchy
with the register file at the top of the pyramid.  There are different ways
to design the register file. NOVIX, et. al. chose one way, everybody else
chose another.  Which is more general purpose?  Clearly, the 3-port register
file.

One bit (ok, one-half bit) of wisdom:  It is not always useful to look at
the features of one architecture and then conjecture that another would be
improved by adopting those features.  The reason is that an architecture is
a unit, the expression of a conceptual approach.  There is a complex matrix
of dependencies between the features of any architecture so that removing
one feature can invalidate all or a large fraction of the other features.
It is like saying that Chopin's piano concertos need to have a heavy beat.
The design center of the NOVIX chip, e.g., permits several memory ports and
a stack orientation.  This is fine if you can live with 64K of stack.  Chopin
is great until you want to go disco dancing....

Claim:  The best architectures are those that appear to have been designed
by one person.