Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site mips.UUCP
Path: utzoo!watmath!clyde!bonnie!akgua!whuxlm!harpo!decvax!decwrl!Glacier!mips!larry
From: larry@mips.UUCP (Larry Weber)
Newsgroups: net.arch
Subject: Re: risc, cisc, and microprogramming
Message-ID: <145@mips.UUCP>
Date: Fri, 21-Jun-85 01:40:01 EDT
Article-I.D.: mips.145
Posted: Fri Jun 21 01:40:01 1985
Date-Received: Sun, 23-Jun-85 02:16:41 EDT
References: <557@hou2b.UUCP> <1078@peora.UUCP> <334@spar.UUCP>
Organization: MIPS Computer Systems, Mountain View, CA
Lines: 39

> 
> It is also "difficult" to write system software for that subclass of RISC
> machine that simplifies hardware to the point of requiring the software to
> allow for the resolution of pipeline data-dependencies.  It would be an
> Interesting Task to do a code-generator for such a beast.  
> ... 
> Jay Reynolds Freeman (Schlumberger Palo Alto Research)(canonical disclaimer)

This is simply not true.  The simplest way to resolve pipeline dependencies
is to put into the instruction stream exactly the number of nop instructions
that the hardware would do in a machine which has interlocks.  This does 
not make the program run any slower because the hardware would have delayed
the program anyways.  

The real gain comes when your modify the order of instructions to eliminate
delay cycles.  The complexity of the machine determines the complexity
of the algorithm.  On simpler machines you can actually resolve delays 
in 200 lines.  A number of years ago I modified IBM's compiler for 
their systems language for the 370 to resolve an interlock between 
computation and address calculation.  It resulted in a 6% improvement in 
performance; this modest improvement is respectable when you consider that
the compiler was already highly tuned to produce very good code.  On RISC
machines it is possible to achieve a much larger improvement because you
can schedule a wider class of delays - branch delays, load delays, coprocessor
delays etc.

I believe that pipeline scheduling is an excellent "peephole" optimization
whether the hardware has interlocks or not.

The problem of code generation needn't be any harder than in other machine,
in theory you would like to generate instructions that are good candidates
to resolve delays just produced or about to be produced.  In practice, all
you have to is to not "over-work" a single register which would result
in all instructions being interdependent.
-- 
-Larry B Weber
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!larry
DDD:  	415-960-1200
USPS: 	MIPS Computer Systems, 1330 Charleston Rd, Mtn View, CA 94043