Path: utzoo!attcan!uunet!crdgw1!crdos1!davidsen
From: davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr)
Newsgroups: comp.arch
Subject: Re: '040 vs. SPARC (was: Next computer...)
Message-ID: <2105@crdos1.crd.ge.COM>
Date: 8 Feb 90 19:29:45 GMT
References: <8905@portia.Stanford.EDU> <160@zds-ux.UUCP> <38415@apple.Apple.COM> <2101@crdos1.crd.ge.COM> <19233@dartvax.Dartmouth.EDU>
Reply-To: davidsen@crdos1.crd.ge.com (bill davidsen)
Organization: GE Corp R&D Center, Schenectady NY
Lines: 56

In article <19233@dartvax.Dartmouth.EDU> jskuskin@eleazar.dartmouth.edu (Jeffrey Kuskin) writes:

| Yes, but how much do we benefit from the richer instruction sets, even
| if all the instruction are hardwired and execute at 1 cycle/instruction?
| Isn't one of the RISC folks' main arguments for simple instruction sets
| that current compilers don't effectively exploit the complex addressing
| modes and instructions supported in CISC chips?  

  A fair question, but hard to answer in the middle ground. There are
some instructions which make code generation and execution faster for
almost all applications, such as mpy and div. The question has always
been if the gates could be better used to make something else faster,
not if those instructions would be useful. At the other end, there are
instructions which are really special purpose, and I don't think that
anyone would argue for including them in a general purpose CPU, such as
the FFT instruction I discussed here a few weeks ago.

  The answer is that instructions should be added if the sequence of
simple instructions to do the same thing is (a) common, and (b) slower.
If the sequence is more than a few instructions long some tradeoff comes
in because fewer instructions mean fewer hits on the memory. The guide
has got to be the overall speed of the CPU for a general mix (assuming a
g.p. CPU), rather than aiming for a single benchmark. This compromise
leaves room for lots of competition, because performance is a factor of
load characteristics to some extent.

  As long as adding the instructions and addressing modes don't slow
down other stuff, directly or by stealing gates, they can be a net win.
Another compromise is in register scoreboarding. By using a complex
instruction part of the execution may be overlapped with execution of
following instructions. This rapidly gets into interactions between the
compiler quality and features.

  I am told that the 586 will have an SPU for the string operations.
While I would expect this to have very little effect on general
performance, kernel bitmap searches and bitblt *may* now be overlappable
with other things. Is this a better use of gates than more cache? Is the
rumor even true? I don't claim to have the answers, but I have some
programs which use strchr(), strcat(), memcpy(), and such *very*
heavily, and I would be willing to try writing a few routines in
assembler if I could get 20-30% better performance. You have to take
advantage of the hardware.

  Some address complexity, at least in the area of having autoincr on
things is usually a win, but it may require a smart compiler or
scoreboarding to make best use of it. Operations directly to memory is a
favorite whipping boy of the RISC people, but it often saves use of a
register, saves two instructions, and if it allows fewer registers
implemented, or fewer saved on a context switch with only dirty
registers saved, it may be an overall win.

  Sorry for the long reply, but I said initially that the question was
complex. 
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
            "Stupidity, like virtue, is its own reward" -me