Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site mips.UUCP Path: utzoo!watmath!clyde!bonnie!akgua!whuxlm!harpo!decvax!decwrl!Glacier!mips!larry From: larry@mips.UUCP (Larry Weber) Newsgroups: net.arch Subject: Re: risc, cisc, and microprogramming Message-ID: <145@mips.UUCP> Date: Fri, 21-Jun-85 01:40:01 EDT Article-I.D.: mips.145 Posted: Fri Jun 21 01:40:01 1985 Date-Received: Sun, 23-Jun-85 02:16:41 EDT References: <557@hou2b.UUCP> <1078@peora.UUCP> <334@spar.UUCP> Organization: MIPS Computer Systems, Mountain View, CA Lines: 39 > > It is also "difficult" to write system software for that subclass of RISC > machine that simplifies hardware to the point of requiring the software to > allow for the resolution of pipeline data-dependencies. It would be an > Interesting Task to do a code-generator for such a beast. > ... > Jay Reynolds Freeman (Schlumberger Palo Alto Research)(canonical disclaimer) This is simply not true. The simplest way to resolve pipeline dependencies is to put into the instruction stream exactly the number of nop instructions that the hardware would do in a machine which has interlocks. This does not make the program run any slower because the hardware would have delayed the program anyways. The real gain comes when your modify the order of instructions to eliminate delay cycles. The complexity of the machine determines the complexity of the algorithm. On simpler machines you can actually resolve delays in 200 lines. A number of years ago I modified IBM's compiler for their systems language for the 370 to resolve an interlock between computation and address calculation. It resulted in a 6% improvement in performance; this modest improvement is respectable when you consider that the compiler was already highly tuned to produce very good code. On RISC machines it is possible to achieve a much larger improvement because you can schedule a wider class of delays - branch delays, load delays, coprocessor delays etc. I believe that pipeline scheduling is an excellent "peephole" optimization whether the hardware has interlocks or not. The problem of code generation needn't be any harder than in other machine, in theory you would like to generate instructions that are good candidates to resolve delays just produced or about to be produced. In practice, all you have to is to not "over-work" a single register which would result in all instructions being interdependent. -- -Larry B Weber UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!larry DDD: 415-960-1200 USPS: MIPS Computer Systems, 1330 Charleston Rd, Mtn View, CA 94043