Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!samsung!zaphod.mps.ohio-state.edu!rpi!leah!albanycs!crdgw1!crdos1!davidsen From: davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) Newsgroups: comp.arch Subject: Re: '040 vs. SPARC (was: Next computer...) Message-ID: <2114@crdos1.crd.ge.COM> Date: 9 Feb 90 18:30:37 GMT References: <8905@portia.Stanford.EDU> <160@zds-ux.UUCP> <38415@apple.Apple.COM> <2101@crdos1.crd.ge.COM> <19233@dartvax.Dartmouth.EDU> <2105@crdos1.crd.ge.COM> <29099@amdcad.AMD.COM> Reply-To: davidsen@crdos1.crd.ge.com (bill davidsen) Organization: GE Corp R&D Center, Schenectady NY Lines: 44 In article <29099@amdcad.AMD.COM> tim@amd.com (Tim Olson) writes: | But the complex instruction typically binds many operations together, | *reducing* the ability to efficiently overlap subsequent operations. | However, if the complex instruction is split into its constituent | parts, there is much more opportunity for instruction scheduling. Performance depends on how it's done. If the CPU can't do anything else when it starts a complex instruction, then the gains from possible internal overlap of phases will have to outweigh the blocking of the CPU. If the CPU can continue to execute at least some other instructions, then a smart compiler can probably find instructions. This isn't black and white, where all complex instructions are a lose and all simple ones are a win. Volume of instructions impacts memory bandwidth, too. | Either this will take an extra cycle to write back the incremented | address register (in which case an explicit add is just as fast), or | an extra register file port just to write the incremented address at | the same time the load data is written. If more register file ports | are going to be added, I'd rather issue multiple, general-purpose | instructions, which have a much greater chance of being used than a | limited auto-increment mode. What I said about memory bandwidth applies here, but even more to the point, a load or store through a pointer (address register) usually has at least one cycle overhead after the address is used, even with cache. This can be used to do the increment without slowing anything down, and without running another instruction decode. The issue which I believe is primary is if the added complexity of the instruction decode slows it down. Given the number of gates available I believe the answer is "usually not." There are people who argue against having increment, stating that it's not general purpose and that the incrment should be two discrete instructions, namely (1) load immediate to 2nd register value 1, and (2) add 2nd register to the register to be incremented. I don't agree with this, either, but I can see that it is the ultimate extension of the RISC method. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "Stupidity, like virtue, is its own reward" -me