Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!mips!winchester!mash
From: mash@mips.COM (John Mashey)
Newsgroups: comp.arch
Subject: Re: Why floating point hardware: micro-parallelism, micro-cycles
Message-ID: <41518@mips.mips.COM>
Date: 14 Sep 90 23:41:34 GMT
References: <197@validgh.com> <CHUCK.PHILLIPS.90Sep9215755@halley.FtCollins.NCR.COM>
Sender: news@mips.COM
Reply-To: mash@mips.COM (John Mashey)
Organization: MIPS Computer Systems, Inc.
Lines: 219

In article <CHUCK.PHILLIPS.90Sep9215755@halley.FtCollins.NCR.COM> Chuck.Phillips@FtCollins.NCR.COM (Chuck.Phillips) writes:
>>>>>> On 9 Sep 90 15:17:44 GMT, dgh@validgh.com (David G. Hough on validgh) said:
>David> ...since floating-point instructions can be decomposed into simple
>David> integer operations, how can they be justified in a RISC
>David> architecture?  Why is it that they don't run as fast in software?
>David> (They don't, and can't, but you might have to try it to convince
>David> yourself.  All you need to do is look at 64-bit double precision
>David> floating-point add/subtract on a 32-bit RISC architecture).
>
>David> Basically I was attacking the idea that RISC = 'a few simple
>David> instructions'.  This was an overly simple definition anyway.  The
>David> correct definition of RISC architecture is 'good engineering' in the
>David> sense of 'good engineering economy', although not everybody has
>David> realized this yet.
>
>Perhaps RISC does indeed stand for Reduced Instruction Set, and "good
>engineering" can, and has, been applied to CISC architectures (notably the
>80486 and the 68040).
>
>Modern processor design is indeed indebted to the RISC pioneers who, in
>order to compensate for reduced instruction sets, applied "good
>engineering" to come up with some remarkable techniques for parallelism.
>_Except for the reduced number of instructions_, these same techniques can
>be applied to CISC (albeit some techniques with more difficulty).
>
>If a CISC processor _averages_ close to 1 Cycle Per Instruction, what is
>the advantage of removing many of those instructions?  Are you claiming a
>CISC processor is somehow transformed into a RISC processor because of an
>improved CPI, _even though the actual instruction set has not diminished_?
>(e.g.  the 68040 & 80486)
>
>In a given technology, the physics of the medium limits how fast a switch
>can toggle, leaving parallelism as the route for even greater throughput.
>It appears Reduced Instruction Sets and parallelism are, to a great degree,
>orthagonal.  Am I missing something here?
>
>Is it possible higher silicon densities will shift (or have shifted) the
>economics of processor design toward more robust parallelized instruction
>sets, perhaps even toward "Super CISC"?
>
>	Just for discussion,
>
>David> David Hough
>David> dgh@validgh.com	uunet!validgh!dgh	na.hough@na-net.stanford.edu
>
>#include <std/disclaimer.h>
>--
>Chuck Phillips  MS440
>NCR Microelectronics 			Chuck.Phillips%FtCollins.NCR.com
>2001 Danfield Ct.
>Ft. Collins, CO.  80525   		uunet!ncrlnk!ncr-mpd!bach!chuckp

Newsgroups: comp.arch
Subject: Re: Why floating point hardware: micro-parallelism, micro-cycles
Summary: 
Expires: 
References: <197@validgh.com> <CHUCK.PHILLIPS.90Sep9215755@halley.FtCollins.NCR.COM>
Sender: 
Reply-To: mash@mips.COM (John Mashey)
Followup-To: 
Distribution: 
Organization: MIPS Computer Systems, Inc.
Keywords: 

There are a bunch of things in the following discussion that could
use some clarification, or amplification, so here goes:

In article <CHUCK.PHILLIPS.90Sep9215755@halley.FtCollins.NCR.COM> Chuck.Phillips@FtCollins.NCR.COM (Chuck.Phillips) writes:
>>>>>> On 9 Sep 90 15:17:44 GMT, dgh@validgh.com (David G. Hough on validgh) said:
>David> ...since floating-point instructions can be decomposed into simple
>David> integer operations, how can they be justified in a RISC
>David> architecture?  Why is it that they don't run as fast in software?
>David> (They don't, and can't, but you might have to try it to convince
>David> yourself.  All you need to do is look at 64-bit double precision
>David> floating-point add/subtract on a 32-bit RISC architecture).

>David> Basically I was attacking the idea that RISC = 'a few simple
>David> instructions'.  This was an overly simple definition anyway.  The
>David> correct definition of RISC architecture is 'good engineering' in the
>David> sense of 'good engineering economy', although not everybody has
>David> realized this yet.

Dgh has this right about FP (note that on a MIPS, 64-bit FP add = 2 cycles,
hard to match by sequences of integer instructions),`
and it is a good example of what people really do, without
the confusion of counting instructions. 

>Perhaps RISC does indeed stand for Reduced Instruction Set, and "good
>engineering" can, and has, been applied to CISC architectures (notably the
>80486 and the 68040).
Good engineering can be of course applied to CISCs, and has been, for years.
If you track succeeding designs among, for example, the S/360 & VAX
families, you will find that the designers have carefully studied the
statistics of program behavior, moved some instructions from microcode
into hardware, or vice-versa, or even into software emulation.
Examples include:
	360/44 (didn't have decimal ops, for example)
	MicroVAX II (also didn't have decimal ops)
In addition, successive designs have generally gotten more efficient pipeline
designs and memory hierarchies.  Certainly, the 80486 is a fine implementation,
the 68040 appears to be well-thought-out, from the published information.
This whole process, in general, goes on amongst all competent
computer designers, and has been, for many years, and is not particularly
new, nor would I expect that any knowledgable RISC designer tell you
that is was something magic and new.

So what's the difference: let's try again:
	RISC micros were designed from the beginning:
	1) To avoid instruction complexity that would require microcode
	in general, which often costs you 1.5-2 : 1 if used for the
	simpler instructions.
	2) (In better cases) with a great deal of input from software
	people.  Since RISCs are newer, they have a lot of benefit from
	hindsight.  Since RISCs were designed when there was considerable
	more use of high-level languages and (sometimes) optimizing
	compilers, it was much easier to study these things and input them
	into the design.  AS it happens, compiler technology has taken
	leaps in the last decade, and the tradeoffs have changed, not
	suprising, since the entire nature & structure of the computer
	business is a lot different from 10 years ago, and unbelievably
	different from 20 years ago.
	3) RISCs usually were designed after it was clear that caches were
	good things, and that let them make tradeoffs from Day 1, tradeoffs
	that were not necesasrily appropriate for architectures designed
	when caches were either unknown or not practical for the part of
	the design space being attacked.  Also in this category are:
		a) Pure code segments
		b) Virtual memory support, if needed
	In some cases, some older machines allowed programs to
	write into their code any time they felt like it (like into the
	immediately suceeding instruction), or they included features
	that conflicted morewith VM than they need to have.  All of these
	can be worked around, but hindsight...
	4) RISCs are generally designed to permit clean, simple pipelining,
	without requiring huge amounts of logic for special cases and such.
	This is certainly one of the key differences, and again, some of it
	comes from hindsight.
	5) Avoid those instructions that can easily be simulated by
	sequences of simple ones AT COMPARABLE PERFORMANCE.  Include those
	instructions, NO MATTER HOW "COMPLEX" someone thinks they are,
	if those instructions achieve performance that cannot be approximated
	elsewise, and if the tradeoffs are acceptable.  (again: include
	FP Add, which may well be a huge hunk of hardware, but don't
	include Translate&Test). 
It is interesting, as H&P point out, that never in the history of computing
have bunch of ISA (note: just ISAs, nothing said about architecture in general)
designs done at the same time resembled each other as much
as the current crop of RISCs do.  (This is where they describe several different
chips by showing their relatively minor differences from their DLX).
This doesn't mean there aren't important diifferences among them, but
machines that have 32-bit instructions, load/store orientation, usually 32
integer registers available at once, etc, etc, are a lot more alike, than,
say: IBM 1401, IBM 7074, and IBM 7094, or S/360, CDC 6600, Univac 1108, or
VAX & DG MV, or Intel 8086, Moto 68000, and NSC 32K.

>Modern processor design is indeed indebted to the RISC pioneers who, in
>order to compensate for reduced instruction sets, applied "good
>engineering" to come up with some remarkable techniques for parallelism.
>_Except for the reduced number of instructions_, these same techniques can
>be applied to CISC (albeit some techniques with more difficulty).
As noted, good engineering practice is good engineering practice,
and it didn't start with RISC.
However, the reduced number of instructions is the LEAST of the issues,
and people keep getting confused with this.  Much more relevant are
issues like:
	Operand and instruction alignment, especially in VM systems
	Number and especially kinds of addressing modes, especially
		multi-level indirect, for example.
	Number & size of operand fetches/writes caused by an instruction
	Multiple instruction sizes
	Number and kind of side-effects caused by an instruction, especially
		in VM systems
	Exception model

>
>If a CISC processor _averages_ close to 1 Cycle Per Instruction, what is
>the advantage of removing many of those instructions?  Are you claiming a
>CISC processor is somehow transformed into a RISC processor because of an
>improved CPI, _even though the actual instruction set has not diminished_?
>(e.g.  the 68040 & 80486)
Well, so far, 80486s don't appear to average close to 1 CPI, although,
as I've pointed out before, only the designers really know.
On the other hand, if you approximate CPI by MHz/(Integer-VAX-mips),
for machines for whichtaht makes sense, and use SPEC integer = Integer-vax-mips,
you get numbers like: (from "Your Mileage May Vary, Issue 2.0):

Clock	SPEC-Int	Clock/SPEC	Chip	System
25	11.2		2.23		SPARC	SUN SS1+ w/s (64K cache)
25	12.3		2.03		SPARC	Sun SS330 w/s (128K cache)
25	13.3		1.88		486	Intel-reported (128K)
25	18.3		1.37		88K	Moto 8864SP (128K)
25	19.4		1.29		R3000	MIPS Magnum 3000 w/s (64K)
25	19.7		1.27		R3000	MIPS M/2000, RC3260 (128K)
25	20.2		1.24		RS/6000	IBM RS/6000 model 530 w/s (72K)

Note, of course, that there is some element of apples&oranges here,
as these things are not completely contemporaneous in design,
have sometimes rather different silicon budgets, etc.
Still, if you believe clock/SPEC is anywhere near close to CPI for
these machines (it is for MIPS, but that's the only one I can be sure of),
the 486 is still off by factor of 2.  (Mainframes would get closer to 1,
I think, and I suspect the '040 will do al ittle better also.)

Of course, doing a heavily-streamlined implementation of a VAX, X86,
68K, etc ... doesn't magically make them RISC architectures, but of
course, one shouldn't care much, either (except for marketing :-).
The engineers are doing what they should be: making them go faster.
Of course, they sometimes have to squeeze harder to get everything in.
I have high respect for the implementation cleverness
that has often gone into such things, because it is VERY HARD WORK
to make ANYTHING go really fast, and people have to leave with past
decisions.  Consider people who build mainframes (IBM & PCMs):
they must live with decisions made 25 years ago....

-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086