Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!hplabs!hpda!hpcuhc!hpsemc!gph
From: gph@hpsemc.HP.COM (Paul Houtz)
Newsgroups: comp.sys.ibm.pc
Subject: Re: Optimizers and RISC (Re: Why unix doesn't catch on)
Message-ID: <8090021@hpsemc.HP.COM>
Date: 9 May 89 17:36:09 GMT
References: <3181@looking.UUCP>
Organization: HP Technology Access Center, Cupertino, CA
Lines: 59

allbery@ncoast.ORG (Brandon S. Allbery) writes:

>Optimization under RISC consists primarily of (1) recognizing that some
>loops will run faster when "unwound" into linear code and (2) optimizing the
>use of registers.  The latter involves copying often-used values into a
>register once instead of constantly fetching it from memory, and recognizing
>Both can be applied to CISC code; in fact, register optimization is, if
>anything, *more* useful on hardware which doesn't have registers to spare.

>The difference that RISC brings to it is that both are very nearly
>*required* for RISC, whereas CISC can get by without either; consider yours

The statement in the first paragraph is false.  RISC optimization does not 
consist PRIMARILY of unwinding loops and optimizing registers.   Optimizing
registers is a big part of RISC optimization, but you left out filling
branch delay slots.   You see, most RISC implementations are pipelined
implementations, and branches cause bubbles in pipelines.   Therefore the
RISC architectures usually allow the branch to occur after the instruction
following the branch.  That way, the bubble in the pipeline can be filled.

Architecture non-specific example:

     Normally you would code:

         ld r2, *+2            Load return address in register 2
         bl subprog            branch to subprogram

 subprog st r25,$ADDIT
         ad $ADDIT,1
         .
         .
         b r2

     In a RISC implementation, the programs would look the same, except the
     code to get to the subroutine would be "optimized" to look like this:

        bl subprog
        ld r2, *+1
 
     Since the instruction after the brach WILL be executed before the 
     branch, you can save a couple of cycles by placing the branch after
     the load instruction.
 

The RISC optimizers will fill branch delay slots first, then look at 
unwinding loops and then register optimization.  However, in the compilers
I have looked at, register optimization is something the compiler does, not
the optimizer.  Of course, all systems are not the same.

Also, I think it is incorrect to say that RISC architectures MUST be 
optimized to have acceptable performance.   I don't think that this is
born out by scientific evidence availble.

Rather, I would say that RISC code can benefit more from optimization, since 
there is no ROM microcode.   You see, you can not combine millions line of
ROM code and then optimize the whole mess, becuase it is ROM.   With 
RISC, all the code is RAM resident.  Theoretically, once the compiler
pulls in the code it needs to do what CISC instructions used to do, the
whole thing can then be optimized.