Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!uunet!mcsun!ukc!dcl-cs!aber-cs!athene!pcg From: pcg@cs.aber.ac.uk (Piercarlo Grandi) Newsgroups: comp.arch Subject: Re: 68040 where is it? Message-ID: Date: 29 Aug 90 15:12:06 GMT References: <33156@cup.portal.com> <25146@boulder.Colorado.EDU> <2451@crdos1.crd.ge.COM> <1990Aug26.024212.12390@zoo.toronto.edu> Sender: pcg@aber-cs.UUCP Organization: Coleg Prifysgol Cymru Lines: 96 In-reply-to: henry@zoo.toronto.edu's message of 26 Aug 90 02:42:12 GMT On 26 Aug 90 02:42:12 GMT, henry@zoo.toronto.edu (Henry Spencer) said: henry> In article henry> henry> Chuck.Phillips@FtCollins.NCR.COM (Chuck.Phillips) writes: Phillips> While we're discussing rumors, I've been told (by someone I'd Phillips> _expect_ to know) that the 68040 has roughly the same integer Phillips> throughput as a SPARC at Phillips> the same clock speed. henry> This should not be an enormous surprise. The existing SPARCs all henry> do about one instruction per cycle, and the 68040 designers moved henry> heaven and earth (at great expense in design time and silicon) to henry> make the 68040 do likewise for the simpler instructions. I don't think it was all that difficult actually; the RISC subset of the 68K architecture (in instructions and addressing modes) is not that complicated actually. It all depends on whether they wanted to implemented the RISC subset with an underlying load-store architecture or whether they wanted to do like the 486 and play hard tricks with the cache (treating it as a large register bank). In theory you can just RISC'ify a small subset of M68K instructions and then only the register-register modes of the non load/store instructions. Some people I remember used this trick to build fast 68k clones (e.g. EDGE, if I remember well) using MSI components. You want then to recompile things though. I think everybody remembers that when the PL.8 compiler was retargeted to a RISC subset of 370 instructions using only RR instructions for non load/stores the generated code was *faster* than otherwise -- i.e. the 370 is already often implemented internally as a RISC core with paraphernalia appended. henry> The real question is, which one will scale to higher clock speeds Well, things are not that simple. We have three alternatives really: Pure RISC You only got simple instructions and load store. Code is big, CPU has low transistor count, istructions are slow. Pure CISC You only got complex instructions and no special casing. Code is small, CPU has medium transistor count, instructions are slow. RISCy CISC You got simple instructions and address modes implemented as they were RISC; complex instructions and addressing modes are there for backwards compatibility. Code is small, CPU has large transistor count, there are both slow and fast instructions. Actually there is another alternative, mostly used in mainframes e.g. some 370 and very high end VAXes: Super CISC You have a super parallel CPU that decodes and executes complex instructions with lots of internal parallelism. Code is small, CPU has colossal transistor count, all instructions are fast. henry> and more-than-one-instruction-per-cycle execution schemes better? henry> Hint: the simpler one has a decided edge here. Cost effective wise there seems to be evidence that Pure RISC is better than Pure CISC. The choice between RISCy CISC and Pure RISC is not that clear however. Architectural efficiency is comparable, so the contest, as indicated by Spencer, may be decided by the much lower transistor count of Pure RISC, which allows use of more advanced (faster if less dense) technology. There are however technical factors that favour RISCy CISC; one is that higher code density that conserves memory bandwidth is not irrelevant, and the so called "RISC window" which happens when memory gets relatively faster than CPUs may be closing; another is the ability to support rare but important applications better thanks to the CISC part of the instruction set. Non technical considerations are that usually the best (fastest or densest) technology is only available to the largest manufacturers, which are however wedded to CISC architectures; in a sense RISC therefore is how smaller players get comparable performance even if they use less advanced technology (vide SPARC on a gate array). My opinion is that a million plus transistor budget would be better spent in having multiple SPARCs/MIPSes/M88Ks/29Ks/ARMs/NOVIXes per chip rather than a RISCy CISC, but the players who can afford a million plus transistor budget have a vested interest in old, CISC architectures; and that RISCs had better do something about code density, because the relative speed of memory and CPU may change again. Stack instead of laod-store RISCs are my favourite dream. -- Piercarlo "Peter" Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk