Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!uunet!mcsun!ukc!dcl-cs!aber-cs!athene!pcg
From: pcg@cs.aber.ac.uk (Piercarlo Grandi)
Newsgroups: comp.arch
Subject: Re: 68040 where is it?
Message-ID: <PCG.90Aug29161206@athene.cs.aber.ac.uk>
Date: 29 Aug 90 15:12:06 GMT
References: <SMITHW.90Aug22180704@hamblin.hamblin.math.byu.edu>
	<33156@cup.portal.com> <25146@boulder.Colorado.EDU>
	<2451@crdos1.crd.ge.COM>
	<CHUCK.PHILLIPS.90Aug25143508@halley.FtCollins.NCR.COM>
	<1990Aug26.024212.12390@zoo.toronto.edu>
Sender: pcg@aber-cs.UUCP
Organization: Coleg Prifysgol Cymru
Lines: 96
In-reply-to: henry@zoo.toronto.edu's message of 26 Aug 90 02:42:12 GMT

On 26 Aug 90 02:42:12 GMT, henry@zoo.toronto.edu (Henry Spencer) said:

henry> In article
henry> <CHUCK.PHILLIPS.90Aug25143508@halley.FtCollins.NCR.COM>
henry> Chuck.Phillips@FtCollins.NCR.COM (Chuck.Phillips) writes:

Phillips> While we're discussing rumors, I've been told (by someone I'd
Phillips> _expect_ to know) that the 68040 has roughly the same integer
Phillips> throughput as a SPARC at
Phillips> the same clock speed.

henry> This should not be an enormous surprise.  The existing SPARCs all
henry> do about one instruction per cycle, and the 68040 designers moved
henry> heaven and earth (at great expense in design time and silicon) to
henry> make the 68040 do likewise for the simpler instructions.

I don't think it was all that difficult actually; the RISC subset of the
68K architecture (in instructions and addressing modes) is not that
complicated actually. It all depends on whether they wanted to
implemented the RISC subset with an underlying load-store architecture
or whether they wanted to do like the 486 and play hard tricks with the
cache (treating it as a large register bank).

In theory you can just RISC'ify a small subset of M68K instructions and
then only the register-register modes of the non load/store
instructions. Some people I remember used this trick to build fast 68k
clones (e.g. EDGE, if I remember well) using MSI components. You want
then to recompile things though.

I think everybody remembers that when the PL.8 compiler was retargeted
to a RISC subset of 370 instructions using only RR instructions for non
load/stores the generated code was *faster* than otherwise -- i.e. the
370 is already often implemented internally as a RISC core with
paraphernalia appended.

henry> The real question is, which one will scale to higher clock speeds

Well, things are not that simple. We have three alternatives really:

Pure RISC	You only got simple instructions and load store.
		Code is big, CPU has low transistor count, istructions
		are slow.

Pure CISC	You only got complex instructions and no special casing.
		Code is small, CPU has medium transistor count, instructions
		are slow.

RISCy CISC	You got simple instructions and address modes
		implemented as they were RISC; complex instructions
		and addressing modes are there for backwards
		compatibility.
		Code is small, CPU has large transistor count, there
		are both slow and fast instructions.

Actually there is another alternative, mostly used in mainframes e.g.
some 370 and very high end VAXes:

Super CISC	You have a super parallel CPU that decodes and executes
		complex instructions with lots of internal parallelism.
		Code is small, CPU has colossal transistor count,
		all instructions are fast.

henry> and more-than-one-instruction-per-cycle execution schemes better?
henry> Hint: the simpler one has a decided edge here.

Cost effective wise there seems to be evidence that Pure RISC is better
than Pure CISC. The choice between RISCy CISC and Pure RISC is not that
clear however. Architectural efficiency is comparable, so the contest,
as indicated by Spencer, may be decided by the much lower transistor
count of Pure RISC, which allows use of more advanced (faster if less
dense) technology.

There are however technical factors that favour RISCy CISC; one is that
higher code density that conserves memory bandwidth is not irrelevant,
and the so called "RISC window" which happens when memory gets
relatively faster than CPUs may be closing; another is the ability to
support rare but important applications better thanks to the CISC part
of the instruction set.

Non technical considerations are that usually the best (fastest or
densest) technology is only available to the largest manufacturers,
which are however wedded to CISC architectures; in a sense RISC
therefore is how smaller players get comparable performance even if they
use less advanced technology (vide SPARC on a gate array).

My opinion is that a million plus transistor budget would be better
spent in having multiple SPARCs/MIPSes/M88Ks/29Ks/ARMs/NOVIXes per chip
rather than a RISCy CISC, but the players who can afford a million plus
transistor budget have a vested interest in old, CISC architectures; and
that RISCs had better do something about code density, because the
relative speed of memory and CPU may change again. Stack instead of
laod-store RISCs are my favourite dream.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk