Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!uwvax!dogie.macc.wisc.edu!csd4.milw.wisc.edu!lll-winken!uunet!portal!cup.portal.com!bcase From: bcase@cup.portal.com (Brian bcase Case) Newsgroups: comp.arch Subject: Re: 80486 vs. 68040 code size [really: how many regs] Message-ID: <18549@cup.portal.com> Date: 19 May 89 17:59:00 GMT References: <950@aber-cs.UUCP> <25651@amdcad.AMD.COM> <4228@ficc.uu.net> Organization: The Portal System (TM) Lines: 80 >How about some facts on the 32bit version of the NOVIX, Phil Koopman's >WISC chip, and Johns Hopkins Labs stack oriented chip. All of these >chips were faster (more MIPS per Mhz) than the ...(680xx, 80x86) in '87 - >using fairly old technology. Not hard to do. The same (old technology) could be said of the MIPS and SPARC implementations of the time. >How fast might they be if they had a sustained development >effort on the order required to produce the 29000 and the 88000? I give up, how fast? >Did the big name chip developers miss something here? Why didn't any >of them develop a dual (4?) stack chip, zero (ok 1 or 2) addressing >modes, harvard architecure (3 data paths), 16 (or 32) intructions that >were essentially the chips micro-code (instruction bits fed directly >into the control lines on chip, very little decode time). All of these >chips could do a call/branch in 1 cycle, return in 0 cycles, and passed >parameters lived on the stack with everything else. Usually stacks >were cached on chip with overflow to memory. Except for the 0-cycle return, most RISC chips share the same attributes. However, have you considered the fact that the implemenations of commercial RISCs are constrained by, for example, virtual memory? Or the need to support many different kinds of languages? Having 3 or 4 memory ports is CLEARLY a great idea if it fits in with the rest of the system design center. For Forth, a-ok. For a chip that's going to have TLB(s) on chip, not OK. To answer your questions: None of the commercial RISCs have stack architectures because such an architecture defeats optimization strategies and doesn't make use of the inherenently powerful (high-bandwidth, low-latency) chunk of hardware called the 3-port (or ever more ports) register file. All commercial RISC chips have 0 or 1 addressing mode. (This is using my definition of RISC.) Most commercial RISCs have a Harvard implementation with a separate path for instructions and data. Implementing more 32-bit paths would have bad consequences like slowing them all down. While decoding a NOVIX instruction is simpler than that of most commercial RISC instructions, the difference is not important. What is important is that the pipeline consisting of fetch, decode, execute, and write back (maybe a memory stage in there too, like in MIPS) be uniform, that is, all stages requiring essentially the same number of gate delays. Greatly simplifying the decoding beyond what has already been done is not productive for these architectures. (Analogy: RISCs have fast procedure call mechanisms. Speeding up the procedure call by another factor of 10 would have little consequence.) Commercial RISCs have 1 cycle branches and 1 cycle returns (which is really a branch). It might have been possible to architect and implement an indirect branch (i.e. return) instruction that also specifies a (2-address) arithmetic op, similar to what the NOVIX, et. al. has. It would be somewhat complex and irregular. Scheduling the op in the delay slot is much more general and doesn't make the ALU op different from all the rest (2-address vs. 3-address). The win would be probably be small in these architectures. Commerical RISCs also cache the "stack" and everything else, on chip in the general-purpose 3-port register file. A processor has a memory hierarchy with the register file at the top of the pyramid. There are different ways to design the register file. NOVIX, et. al. chose one way, everybody else chose another. Which is more general purpose? Clearly, the 3-port register file. One bit (ok, one-half bit) of wisdom: It is not always useful to look at the features of one architecture and then conjecture that another would be improved by adopting those features. The reason is that an architecture is a unit, the expression of a conceptual approach. There is a complex matrix of dependencies between the features of any architecture so that removing one feature can invalidate all or a large fraction of the other features. It is like saying that Chopin's piano concertos need to have a heavy beat. The design center of the NOVIX chip, e.g., permits several memory ports and a stack orientation. This is fine if you can live with 64K of stack. Chopin is great until you want to go disco dancing.... Claim: The best architectures are those that appear to have been designed by one person.