Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watmath!clyde!rutgers!ucla-cs!zen!ucbvax!hplabs!nsc!voder!apple!bcase From: bcase@apple.UUCP Newsgroups: comp.arch Subject: Re: D-machine helped spawn RISC Message-ID: <6281@apple.UUCP> Date: Fri, 18-Sep-87 13:43:00 EDT Article-I.D.: apple.6281 Posted: Fri Sep 18 13:43:00 1987 Date-Received: Sun, 20-Sep-87 02:10:13 EDT References: <347@erc3ba.UUCP> <478@esunix.UUCP> <2785@ames.arpa> <6266@apple.UUCP> <600@rocky.STANFORD.EDU> Reply-To: bcase@apple.UUCP (Brian Case) Organization: Apple Computer Inc., Cupertino, USA Lines: 63 In article <600@rocky.STANFORD.EDU> andy@rocky.UUCP (Andy Freeman) writes: >Flynn used the same compiler/optimizer with different final code generators >to study a number of different architectures. (The compiler and optimizer >were written under John Hennessy's direction a few years ago. Yes, that >Hennessy.) John Hennessey certainly knows up from down when it comes to compilers. But, in my humble opinion, many really important optimizations happen *after* code generation; this is especially true for RISCs, I believe. Looking at the output of modern, commercial "optimizing" compilers, I am appalled at the code quality for certain cases. Just because a text book says that optimization occurs before code generation doesn't mean that's the best way. > All of the architectures had the same ALU; they differed in >instruction format and register set architecture. (They compared different >register window schemes with monolithic register sets of various sizes.) **>Since all of the tests used the same compiler and optimizer, much of the** **>remaining differences were due to differences between the architectures.** This is the claim I don't believe, not even a little bit. >Remember, >the critical path in MIPS, MIPS-X, and the Berkeley RISC processors is >not in the control logic; I don't know about MIPS Co's product. I'm not so sure that I believe this statement. It is true that, in most of the cases listed, little *area* was spent, but, at least for the original Stanford MIPS, the master pipeline controller was a real problem. Remember, whether or not to "complexify" instructions set definition is driven (or should be) by what software (compiler, OS) wants/can deal with, not *only* by what hardware can stand. The fact that I can maintain cycle time even if I "complexify" the instruction set does not mean it is the right thing to do! What if the compiler never emits those complex instructions? >``From data traffic considerations, it seems that the [360-like CISC] >with a register set of about size 16 plus a small data cache is preferable >to multiple register sets for most area combinations.'' Again, I question the compiler effort here. >Maybe instruction bandwidth isn't important, but data bandwidth seems >to be. As Flynn and company conclude, ``@i[Balanced optimization] is >the key to overall instruction set efficiency.'' Let's see some data >from RISC folks. Bandwidth is not the only consideration: LATENCY is often more important where loads/stores are concerned (at least in machines, like RISC II, Am29000, and I suspect SPARC that have a relatively low percentage of loads/stores). High instruction bandwidth is very important for RISC machines; latency is also important but there are techniques for dealing with this so that it won't be so apparent at the chip boundary. Techniques like interleaving and using burst-mode memories (VDRAMS, SCDRAM, nibble-mode, etc.) can deal with sequential bandwidth, but if it takes 2 milliseconds to get the first word, who cares? Latency, latency, latency. Thus, arguements against RISC founded on bandwidth requirements directed me, at least, will fall on deaf ears. About the only real data that I can offer is that the percentage of loads/stores for stack-cache machines (RISC II, SPARC, Am29000, etc) is often about 1/2 that observed in machines with only flat register files (MIPS, etc.).