Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watmath!clyde!rutgers!labrea!decwrl!pyramid!prls!mips!mash
From: mash@mips.UUCP
Newsgroups: comp.arch
Subject: Re: brash micros versus the Big Iron: not yet
Message-ID: <649@winchester.UUCP>
Date: Tue, 1-Sep-87 23:48:23 EDT
Article-I.D.: winchest.649
Posted: Tue Sep  1 23:48:23 1987
Date-Received: Thu, 3-Sep-87 06:23:50 EDT
References: <622@winchester.UUCP> <12953@amdahl.amdahl.com> <630@winchester.UUCP> <1202@pdn.UUCP> <640@winchester.UUCP> <1221@pdn.UUCP>
Reply-To: mash@winchester.UUCP (John Mashey)
Organization: MIPS Computer Systems, Sunnyvale, CA
Lines: 145

In article <1221@pdn.UUCP> alan@pdn.UUCP (0000-Alan Lovejoy) writes:
>In article <640@winchester.UUCP> mash@winchester.UUCP (John Mashey) writes:
>>I'm curious: how does someone (other than a person with a complete
>>architectural simulator, or ultra-precise hardware monitors), ever know
>>what the `native mips' rating is, at least one machines that have
>>caches, TLBs, etc?

>Without access to the proper equipment, you either believe the reports
>of those who do, or make approximations based on timing the execution of
>a known number of instructions (dynamically counted).  Wasn't that 
>obvious?
1) Taking anything on faith in computer performance measurement: exciting.
2) Approximations based on timing the execution of a known number
of instructions:
	a) In any machine except a pure 1-cycle, 1-instructtion RISC
	with no cache or TLB, it is HARD to get this to mean much,
	in general.
	b) You can write programs that do things like a million adds,
	and count the number of instructions in the loop.  This doesn't
	tell you much about real programs.
	c) You can run real programs, time them, and then go back and
	count the instructions.  This gets back to having a full-bore
	architectural simulator.
Again, this is not to denigrate cycles/instruction as something architects
use: I just have trouble when people start throwing the numbers around
with insufficient precision. I've even seen things published by people
comparing CPI of their systems and our systems, where we still cannot figure
out how they derived the numbers for our systems [or their systems, either].
I just have this strange fondness for numbers where I can know where they
come from, and where I can measure them without magic.

>>(Are these the same kinds of mips as the scale I was using?
>>If so, how many people out there believe that a 16.7MHz 030 will be
>>3X faster than a Sun3/160, or 1.5X a Sun3/260?)

>The MIPS numbers were for native instructions/second executing 
>"average" code.  In other words, the 68030 is at least twice as fast
>as a 68020 at the same MHz (average speed).

>>>[reference to Motorola report which states that lower MIPS does not
>>>necessarily mean lower performance--in fact, the reverse may be true]

>>(Could you give a better citation for these reports?)  I think we all
>Sorry, I don't have one handy.  If you're really interested, I can mail
>you one.
Yes: please.  It would be useful to understand what they're saying.
>
>>agree that what is important is real work, not how many actual instructions
>>are being executed.  That's why I always use relative performance
>>numbers, as slippery as they are, since "peak mips", "sustained mips",
>>"native mips" really just don't mean much to most people.

>Using "normalized" MIPS I have no problem with--except for a MHz/MIPS
>ratio.  Using normalized MIPS in such a ratio is ridiculous, unless
>you want to "normalize" the MHz as well--clearly a bizarre idea.
>The whole point of the ratio is to see average cycles per instruction,
>not average cycles per unit-of-work, since a 6888x and a 34010 can have
>the same number of cycles per instruction, but very disproportionate
>cycles per unit-of-work DEPENDING ON THE TASK BEING MEASURED.  Once
>you have calculated the cycles/instruction ratio, then you can calculate
>a more meaningful work/instruction ratio from it.

I must be missing something.  What is wrong with cycles / (unit of work)?
If two CPUs (34010?) have the same CPI, but differ in cycles/(unit of work),
then it must be that one of them is executing a lot more instructions on
a given benchmark.  This was the original anti-RISC argument: "yes, the
instructions go faster, but you have to do a lot more of them, so the
actual work done is the same, or little more."  One of the reasons we've
gotten away from quoting CPI numbers was the continual objections of people
to the kind of analysis that says:
	performance = (# instrs) / (cycle-time * CPI)
with the claim that given a constant cycle-time (dependent on technology),
that RISCs a) radically reduced CPI, while b) only moderately increasing
(# instructions).  I believe that is basically true, more-or-less, given
that you throw in "at comparable hardware cost", since you can make CISCs
go fast by throwing lots of hardware at it.  THE PROBLEM IS THAT YOU COULD
GET CONVINCING NUMBERS FOR YOUR OWN MACHINES FOR INSTRUCTIONS and CPI,
but NOT for anybody else's, at least, you couldn't get numbers that
people would believe.

> >>It is amusing to see the hardware people champion "simplicity" of design
> >>at the expense of software complexity, while software people clamor
> >>for "simplicity" of design at the expense of hardware complexity
> >>(for example, Niklaus Wirth).  Shifting complexity around between
> >>hardware and software is just irresponisble sleight of hand unless
> >>you can prove, on a case-by-case basis, that a given function
> >>is more efficiently handled either in the hardware or the software. 
>
> >I think there's a complaint here, but I'm not exactly sure what it is,
> >or who it's aimed at.  If the first part was aimed at RISC fans, it
> >does seem a little misplaced, since many of the most vocal RISC
> >advocates are software people!  More specificity would help on the
> >rest: I'd be glad to answer the comments if I knew what they meant.
> 
>>-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
>
>There is a fixed cost associated with executing an instruction that has
>nothing to do with how 'simple' or 'complex' it is (the number of bits
>needed to encode the instruction IS important, however).  This argues
>that instructions should perform as much work as possible.

>Instructions that perform a lot of work may do more work than was
>needed, wasting machine resources.  This argues that instructions should
>do as little work as possible.  

>Obviously, it is necessary to find the optimum balance between these to
>conflicting constraints.  I would like to see a mathematical proof
>or theory that could demonstrate or discover just what that balance is.
>Without it, machine designers are little more that artists, not
>engineers.

I doubt there is a closed mathematical solution to this problem, (if there were,
the optimal machines would be designed already!), but there is an iterative
process that has been used at various places, and is by now pretty
well-known:

a) Start with a set of benchmarks that you think are relevant to the
computers you want to build.
b) Define a baseline architecture.
c) Create compilers that can generate code for the baseline.
d) Create a simulator that can execute code for the architecture
and generate all the statistics you need to determine performance.
e) Now, iterate steps b-d by tweaking the architecture and fixing the others.
	- You might add an instruction, whose use lets you decrease the
	path-length, and it might be free, or it might increase the machine's
	cycle time, in which case you have to analyze the results carefully.
	- You might delete an instruction, increasing the path length,
	but perhaps decreasing the cycle time.
	- It turns out, both HP and we finally used a rule like "If you
	can't prove an instruction is worth 1% in overall performance,
	don't add it."
f) Sooner or later, you get tired, or you've converged as best you can,
or your venture capitalists would like a product sometime, and you
go build it.

Of course, in reality, you often have to do this under time pressure,
and your results are no better than your set of benchmarks, plus
the level of approximation offered by your compilers and simulators.
Picking the worng benchmarks can be catastrophically bad if they
make you believe you can leave something out that shouldn't be.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086