Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!hplabs!hpda!hpcuhc!hpsemc!gph From: gph@hpsemc.HP.COM (Paul Houtz) Newsgroups: comp.sys.ibm.pc Subject: Re: Optimizers and RISC (Re: Why unix doesn't catch on) Message-ID: <8090021@hpsemc.HP.COM> Date: 9 May 89 17:36:09 GMT References: <3181@looking.UUCP> Organization: HP Technology Access Center, Cupertino, CA Lines: 59 allbery@ncoast.ORG (Brandon S. Allbery) writes: >Optimization under RISC consists primarily of (1) recognizing that some >loops will run faster when "unwound" into linear code and (2) optimizing the >use of registers. The latter involves copying often-used values into a >register once instead of constantly fetching it from memory, and recognizing >Both can be applied to CISC code; in fact, register optimization is, if >anything, *more* useful on hardware which doesn't have registers to spare. >The difference that RISC brings to it is that both are very nearly >*required* for RISC, whereas CISC can get by without either; consider yours The statement in the first paragraph is false. RISC optimization does not consist PRIMARILY of unwinding loops and optimizing registers. Optimizing registers is a big part of RISC optimization, but you left out filling branch delay slots. You see, most RISC implementations are pipelined implementations, and branches cause bubbles in pipelines. Therefore the RISC architectures usually allow the branch to occur after the instruction following the branch. That way, the bubble in the pipeline can be filled. Architecture non-specific example: Normally you would code: ld r2, *+2 Load return address in register 2 bl subprog branch to subprogram subprog st r25,$ADDIT ad $ADDIT,1 . . b r2 In a RISC implementation, the programs would look the same, except the code to get to the subroutine would be "optimized" to look like this: bl subprog ld r2, *+1 Since the instruction after the brach WILL be executed before the branch, you can save a couple of cycles by placing the branch after the load instruction. The RISC optimizers will fill branch delay slots first, then look at unwinding loops and then register optimization. However, in the compilers I have looked at, register optimization is something the compiler does, not the optimizer. Of course, all systems are not the same. Also, I think it is incorrect to say that RISC architectures MUST be optimized to have acceptable performance. I don't think that this is born out by scientific evidence availble. Rather, I would say that RISC code can benefit more from optimization, since there is no ROM microcode. You see, you can not combine millions line of ROM code and then optimize the whole mess, becuase it is ROM. With RISC, all the code is RAM resident. Theoretically, once the compiler pulls in the code it needs to do what CISC instructions used to do, the whole thing can then be optimized.