Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!lll-tis!ames!lamaster
From: lamaster@ames.arpa (Hugh LaMaster)
Newsgroups: comp.arch
Subject: Gate counts for implementations of architectures
Message-ID: <3793@ames.arpa>
Date: 30 Dec 87 21:07:55 GMT
Reply-To: lamaster@ames.UUCP (Hugh LaMaster)
Organization: NASA Ames Research Center, Moffett Field, Calif.
Lines: 40
Keywords: RISC, gates, instruction set


One of the questions that not been discussed much in the RISC discussion
is the amount of chip real estate that must be devoted to implementing
the instructions in a given architecture.  The question of critical paths
for branch instructions and addressing modes is certainly significant, but
chip area is a somewhat different question.  Initial enthusiasm aside, the
RISC question is largely one of figuring out which instructions, and
their hardware realization, give the most speed for a set of applications.
A recent Computer magazine had a list which showed some GaAs processors
with very fast clocks, but very small gate counts.  This is an extreme 
example of how it has always been when building fast machines:  
Is it worth it to add a particular function?  

For example, various Cray CPU's have had about 500K gates in them: this
includes hardware integer add, multiply, divide, floating add, multiply,
reciprocal approximation (fully segmented), and shift/mask instructions.
The Cyber 205 has about twice as many gates - and has a 
slower clock speed.  Are the extra gates worth it on the Cyber 205, even
at the cost of having a slower clock speed (note: it is easy to find
applications which make either machine look faster)?  If
less is more (RISC), how about 10K gates maximum, if that is what can be
put on a single GaAs microprocessor?  Maybe I can simulate floating
point with integer arithmetic faster than having special f.p. hardware,
if by skipping f.p. I can put a processor on one very fast single chip.

Question:  What are the gate counts for various implementations of the
same architecture (It would be illuminating to complare a 360/50 with
a 360/91 for example - same architecture, but one processor pipelined,
with fast floating point), and of different architectures?

What instructions increase gate count inordinately?  Are there particular
"bad guy" instructions which take up a lot of space (I mean besides
floating point instructions, and issue in itself...)

Would anyone from Sun, MIPS, or AMD care to comment on how many gates
there are in their processors - and their floating point coprocessors?

And what about the anti-RISC argument which says that microcoded machines
are more efficient with chip area because they have less random logic?
(All gates are not equal: physically regular gates are more equal than
random gates)?