Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!van-bc!ubc-cs!unixg.ubc.ca!cheddar.ucs.ubc.ca!panon
From: thornburg@mtsg.ubc.ca (Jonathan Thornburg)
Newsgroups: comp.arch
Subject: multiarchitecture chips (was Re: The CPU with 3 brains)
Summary: they'll be slower than single architecture chips
Message-ID: <1990Dec10.230226.13217@unixg.ubc.ca>
Date: 10 Dec 90 23:02:26 GMT
Sender: news@unixg.ubc.ca (Usenet News Account)
Reply-To: thornburg@mtsg.ubc.ca (Jonathan Thornburg)
Followup-To: comp.arch
Distribution: na
Organization: University of British Columbia
Lines: 42

A number of recent comp.arch postings have discussed the notion of a
multiarchitecture CPU chip, one which has on the same chip one of each
of the currently popular CPU architectures, say AMD29000, i386, i860,
M68000, M88000, R3000, Sparc, and perhaps a few others I don't remember
right now.
 
I think such a chip would be considerably slower than a single
architecture CPU chip with the same number of transistors.  The reason
is that the single architecture chip can use the "extra" transistors to
boost performance by things like more cache, multiport register files,
superscalar execution units, fully parallel floating point arithmetic
units, etc.
 
For example, more cache almost always boosts performance.  Deeper
pipelining sometimes helps, but costs more chip area for the pipelining
registers, the extra control logic, and the extra register file ports.
VLIW or superscalar techniques (eg Multiflow or IBM RS6000 respectively)
can give factors of 2-4 improvements in performance, at a cost of 4-8
times more chip area.  (See Hennessey & Patterson sections 6.7 and 6.8
for details.)
 
A rereading of Hennessy & Patterson appendix A reveals *many* ways to
speed up floating point arithmetic if more transistors are available.
For example, p.A-45 states that 1990 chip technology can't fit a fully
parallel double precision FP multiplier on the same chip as the rest of
FP arithmetic, so today's FP chips use narrower multipliers and hence
are slower at DP FP multiply.
 
I could go on, but I think the point is clear:  Dividing 1 million
transistors into 7 140K-transistor CPUs will yeild 7 CPUs, each slower
than a 1M-transistor CPU.
 
- Jonathan Thornburg

  Dept of Geophysics & Astronomy
  The University of British Columbia        thornburg@mtsg.ubc.ca
  Vancouver     BC     V6T 1W5            userbkis@ubcmtsg.bitnet
  Canada                      ...!ubc-cs!ubcmtsg.bitnet!thornburg
--
      panon@ucs.ubc.ca             or          USERPAP1@UBCMTSG
                                   or          USERPAP1@mtsg.ubc.ca
Looking for a .signature? "We've already got one. It is ver-ry ni-sce!"