Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!decwrl!amdcad!crackle!tim
From: tim@crackle.amd.com (Tim Olson)
Newsgroups: comp.sys.mac
Subject: Re: Pierce Explains RISC. was new mac rumors
Message-ID: <24629@amdcad.AMD.COM>
Date: 26 Feb 89 21:53:20 GMT
References: <70755@ti-csl.csc.ti.com> <9770@cit-vax.Caltech.Edu>
Sender: news@amdcad.AMD.COM
Reply-To: tim@amd.com (Tim Olson)
Organization: Advanced Micro Devices, Inc. Sunnyvale CA
Lines: 86
Summary:
Expires:
Sender:
Followup-To:

In article <9770@cit-vax.Caltech.Edu> wetter@cit-vax.Caltech.Edu (Pierce T. Wetter) writes:
| > 
| > This is very interesting.  I don't have the strongest background in
| > hardware architecture, but, could you please explain how a processor
| > could be optimized for a specific high level language?
| > 
|      The speed of a mircoprocessor is somewhat proportional to the number
| of instructions it has to implement. For instance the 6502 can do every 
| instruction in one clock cycle, while the 68000 can take up to 70.

This has more to do with the 6502 having hardwire decode, while the 68k
is microcoded. (The 6502, by the way, takes at least 2 clock cycles to
execute an instruction, and can take up to 7, depending upon addressing
modes).

|   The other thing that most HLL need is lots and lots of registers. The
| 68000 has 16, but 1 is a stack pointer(a7), 1 points to the global area(a5), and
| another points to the local area(a6), and one is used to return function values
| (d0), leaving only 5 address and 7 data registers available. Some RISC chips
| on the other hand have up to 25 registers, a 256byte data cache, and a program
			       ^^^^^^^^^^^^
Most RISC chips have at least 31 GP registers (they usually reserve 1
for a constant 0); some have more.  SPARC implementations
currently have ~120 registers, and the Am29000 has 192.

|    That's the basic gist of how Reduced Instruction Set CPU's if you want more
| info you can get the data sheets for the 88000, or one of the other new risc
| chips and they'll go into a lot more detail.
| Pierce

Here's another quickie explination:


For a RISC machine to be faster than a CISC machine, it simply must take
fewer cycles to complete the overall program, even if this means
executing more instructions:

							1
	Performance = 1/sec = cycles/sec * -----------------------------
					   cycles/inst  *  [total inst]


Thus, we can improve performance by raising the cycles/sec (increasing
the clock frequency; basically a processing problem), decreasing the
total number of instructions executed (by making them complex: CISC), or
decreasing the number of cycles that an instruction requires (by making
them simple: RISC).  Note that these variables are not independant; it
is hard to make very complex instructions run fast, etc. 


That is the view from the hardware side.  However, software
(specifically optimizing compilers) play just as important a role in the
RISC performance picture.  One can make the argument that RISC & CISC
look very similar at the "micromachine" level, and that the fetching of
a microinstruction from the microcode on a CISC machine is somewhat like
a RISC machine fetching an instruction.  Now the CISC machine has
hard-wired microcode to execute from, while the RISC machine
instructions are "custom-tailored" by the compiler for the problem at
hand.

For example, let's look at a typical loop:

	for (i=0; i<MAX; ++i)
		a[i] = 0;

A CISC machine may have a single instruction that performs the inner
statement, by using an indexed base+offset addressing mode.  However,
each time through the loop it must fetch the 32-bit base address of the
array "a", multiply the index variable i by the size of the elements of
a, add the two values together to form an address, then store 0 out to
that location.

A highly-optimizing compiler can recognize that the base of the array
never changes (so it can be computed in a register before the loop
begins [loop-invarient code motion]), and we can increment this address
by the size of each element, rather than incrementing by 1 and then
multiplying (or shifting) [strength-reduction].  Now the loop consists
of a few, simple instructions (store, add, compare, branch), which
matches nicely with what is provided by the RISC machine (and they are
performed quickly, because they are executed directly instead of being
interpreted by another level of microcode).


	-- Tim Olson
	Advanced Micro Devices
	(tim@crackle.amd.com)