Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!van-bc!ubc-cs!unixg.ubc.ca!cheddar.ucs.ubc.ca!panon From: thornburg@mtsg.ubc.ca (Jonathan Thornburg) Newsgroups: comp.arch Subject: multiarchitecture chips (was Re: The CPU with 3 brains) Summary: they'll be slower than single architecture chips Message-ID: <1990Dec10.230226.13217@unixg.ubc.ca> Date: 10 Dec 90 23:02:26 GMT Sender: news@unixg.ubc.ca (Usenet News Account) Reply-To: thornburg@mtsg.ubc.ca (Jonathan Thornburg) Followup-To: comp.arch Distribution: na Organization: University of British Columbia Lines: 42 A number of recent comp.arch postings have discussed the notion of a multiarchitecture CPU chip, one which has on the same chip one of each of the currently popular CPU architectures, say AMD29000, i386, i860, M68000, M88000, R3000, Sparc, and perhaps a few others I don't remember right now. I think such a chip would be considerably slower than a single architecture CPU chip with the same number of transistors. The reason is that the single architecture chip can use the "extra" transistors to boost performance by things like more cache, multiport register files, superscalar execution units, fully parallel floating point arithmetic units, etc. For example, more cache almost always boosts performance. Deeper pipelining sometimes helps, but costs more chip area for the pipelining registers, the extra control logic, and the extra register file ports. VLIW or superscalar techniques (eg Multiflow or IBM RS6000 respectively) can give factors of 2-4 improvements in performance, at a cost of 4-8 times more chip area. (See Hennessey & Patterson sections 6.7 and 6.8 for details.) A rereading of Hennessy & Patterson appendix A reveals *many* ways to speed up floating point arithmetic if more transistors are available. For example, p.A-45 states that 1990 chip technology can't fit a fully parallel double precision FP multiplier on the same chip as the rest of FP arithmetic, so today's FP chips use narrower multipliers and hence are slower at DP FP multiply. I could go on, but I think the point is clear: Dividing 1 million transistors into 7 140K-transistor CPUs will yeild 7 CPUs, each slower than a 1M-transistor CPU. - Jonathan Thornburg Dept of Geophysics & Astronomy The University of British Columbia thornburg@mtsg.ubc.ca Vancouver BC V6T 1W5 userbkis@ubcmtsg.bitnet Canada ...!ubc-cs!ubcmtsg.bitnet!thornburg -- panon@ucs.ubc.ca or USERPAP1@UBCMTSG or USERPAP1@mtsg.ubc.ca Looking for a .signature? "We've already got one. It is ver-ry ni-sce!"