Path: utzoo!attcan!uunet!crdgw1!crdos1!davidsen From: davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) Newsgroups: comp.arch Subject: Re: '040 vs. SPARC (was: Next computer...) Message-ID: <2105@crdos1.crd.ge.COM> Date: 8 Feb 90 19:29:45 GMT References: <8905@portia.Stanford.EDU> <160@zds-ux.UUCP> <38415@apple.Apple.COM> <2101@crdos1.crd.ge.COM> <19233@dartvax.Dartmouth.EDU> Reply-To: davidsen@crdos1.crd.ge.com (bill davidsen) Organization: GE Corp R&D Center, Schenectady NY Lines: 56 In article <19233@dartvax.Dartmouth.EDU> jskuskin@eleazar.dartmouth.edu (Jeffrey Kuskin) writes: | Yes, but how much do we benefit from the richer instruction sets, even | if all the instruction are hardwired and execute at 1 cycle/instruction? | Isn't one of the RISC folks' main arguments for simple instruction sets | that current compilers don't effectively exploit the complex addressing | modes and instructions supported in CISC chips? A fair question, but hard to answer in the middle ground. There are some instructions which make code generation and execution faster for almost all applications, such as mpy and div. The question has always been if the gates could be better used to make something else faster, not if those instructions would be useful. At the other end, there are instructions which are really special purpose, and I don't think that anyone would argue for including them in a general purpose CPU, such as the FFT instruction I discussed here a few weeks ago. The answer is that instructions should be added if the sequence of simple instructions to do the same thing is (a) common, and (b) slower. If the sequence is more than a few instructions long some tradeoff comes in because fewer instructions mean fewer hits on the memory. The guide has got to be the overall speed of the CPU for a general mix (assuming a g.p. CPU), rather than aiming for a single benchmark. This compromise leaves room for lots of competition, because performance is a factor of load characteristics to some extent. As long as adding the instructions and addressing modes don't slow down other stuff, directly or by stealing gates, they can be a net win. Another compromise is in register scoreboarding. By using a complex instruction part of the execution may be overlapped with execution of following instructions. This rapidly gets into interactions between the compiler quality and features. I am told that the 586 will have an SPU for the string operations. While I would expect this to have very little effect on general performance, kernel bitmap searches and bitblt *may* now be overlappable with other things. Is this a better use of gates than more cache? Is the rumor even true? I don't claim to have the answers, but I have some programs which use strchr(), strcat(), memcpy(), and such *very* heavily, and I would be willing to try writing a few routines in assembler if I could get 20-30% better performance. You have to take advantage of the hardware. Some address complexity, at least in the area of having autoincr on things is usually a win, but it may require a smart compiler or scoreboarding to make best use of it. Operations directly to memory is a favorite whipping boy of the RISC people, but it often saves use of a register, saves two instructions, and if it allows fewer registers implemented, or fewer saved on a context switch with only dirty registers saved, it may be an overall win. Sorry for the long reply, but I said initially that the question was complex. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "Stupidity, like virtue, is its own reward" -me