Path: utzoo!censor!geac!torsqnt!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!samsung!uunet!mcsun!cernvax!chx400!bernina!neptune!inf.ethz.ch!brandis From: brandis@inf.ethz.ch (Marc Brandis) Newsgroups: comp.arch Subject: Re: RISCizing a CISC processor Message-ID: <17648@neptune.inf.ethz.ch> Date: 7 Dec 90 14:49:03 GMT References: <9012070105.AA02416@hcrlgw.crl.hitachi.co.jp> Sender: news@neptune.inf.ethz.ch Reply-To: brandis@inf.ethz.ch (Marc Brandis) Organization: Departement Informatik, ETH, Zurich Lines: 102 In article <9012070105.AA02416@hcrlgw.crl.hitachi.co.jp> joe@hcrlgw.crl.hitachi.co.JP (Dwight Joe) writes: >I would like some input on the following idea to extend the life of >CISC processors. > >Consider a hypothetical machine: IM 68386C (CISCized). >First, determine the dynamic instruction profile of the target mix. [ some stuff deleted ] >Then, rank the instructions from highest frequency to lowest. [ some more deleted ] >In a CISC chip, there is a certain redundancy. In other words, >some of the complex instructions can be written in terms of the >simpler instructions. [ some stuff deleted ] >Now, from the ranking of the instructions, determine the >smallest "i" such that all I[j] with "j > i" can be written >in terms all I[k] with "k <= j". Designate the set of >the first i instructions from the above ranking to >be the "RISC Set". [ some stuff deleted ] >Now, using timing analysis, estimate the performance of >implementing the RISC Set and the I/O Set in hardware >and implementing the CISC Set as subroutines in >a microcode store. These subroutines are written with >instructions from the RISC Set. This is exactly what modern computer architecture is all about. Look for the often encountered cases and optimize these while accepting some overhead for the less common ones. This technique has been heavily used in the design of RISC processors, but it is not restricted to this area, of course. Dynamic distributions have also been used to optimize modern CISC chips. The feasibility of the approach to encode the less common instructions using combinations of the "RISC set" in microcode ROM depends heavily on how well you can express their functionality using the "RISC set". It depends also on how much overhead you have to pay for the switch to microcode. The switch to microcode can be done in 0 cycles as the Intel i960 CA Users Manual states, but I am not sure that it can easily be done. Note that it is not always easy to express complex instructions on a CISC processor in terms of simple instructions. Complex instructions often have a lot of side-effects and you have to simulate them correctly. One thing causing trouble is the condition-code register. Some of the simpler instructions that you would like to use to simulate the complex ones may change the condition code in a way that does not match the semantics of the complex instruction. You can get rid of the problem by introducing some new instructions (whether they are only usable from the microcode ROM or not is a different issue), but it is not an easy task. Moreover, one of the though parts in designing high-performance CISC processors is to make the instruction decoder fast. RISC processors have typically very simple and regular instruction sets, where each instruction has the same size. This makes decoding them straightforward. Implementing an instruction decoder that can decode one instruction per cycle for a complex instruction set is hard to do, as there are a lot of different formats to be considered. Note that instruction decoding in a CISC environment does not naturally lead to pipelined solutions, as you need the size of the previous instruction in order to begin decoding the current instruction in the right place. >The great thing about the IM 68386R (RISCized) processor is >that super-scalarizing it will be no harder than for >a RISC processor because, we now essentially have >a RISC processor (one with subroutines microcoded to >handle CISC Set instructions). We will only be super-scalarizing >the RISC Set, _not_ the full set of the IM 68386C. No, here I disagree for two reasons. First, you have to treat the stream of instructions as if the complex instruction had been in place replaced by the stream from the microcode ROM. As this stream has originally been designed as one instruction, it has a high likelihood of having a lot of dependencies in it, so that there is not a lot of parallelism to be gained. You may get rid of this problem by using huge reservation stations and at least one level of speculative execution, but this means a lot of hardware. Second, as you said before, instructions from the RISC set are often encountered in the program. If you want to achieve an execution rate of more than one instruction per cycle, your decoder (the one decoding the CISC instruction set) has to decode more than one instruction per cycle. As I already said, it is pretty hard to design such a decoder that is able to decode one instruction per cycle, not to talk about one that can do multiple instructions per cycle. Note that each instruction has to be decoded after the other because of the varying size of the instructions. One way to solve this is to speculatively decode instructions starting at different offsets and then to discard the wrong ones. Let us assume you want to decode three instructions in the 386 instruction set per cycle on the average. The average instruction length on the 386 is 4.6 bytes as I remember. So with 14 (!!!) instruction decoders you should have a reasonable chance to get 3 instructions decoded per cycle. >The other great thing is that the IM 68386R is upward compatible >with the IM 68386C and can use its large installed base of >programs. Here I heavily disagree. It would be better to get away from these architectures as soon as possible. Note that each hour this machines are around new software for it is being written (that may not be easily ported to other architectures) giving more and more weight to your argument. Marc-Michael Brandis Computer Systems Laboratory, ETH-Zentrum (Swiss Federal Institute of Technology) CH-8092 Zurich, Switzerland email: brandis@inf.ethz.ch