Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!watmath!clyde!cbosgd!ihnp4!houxm!whuxl!whuxlm!akgua!gatech!seismo!hao!nbires!stcvax!crp
From: crp@stcvax.UUCP
Newsgroups: net.arch
Subject: Re: RISCs, caches, and vertical migration
Message-ID: <110@stcvax.UUCP>
Date: Fri, 14-Mar-86 20:53:59 EST
Article-I.D.: stcvax.110
Posted: Fri Mar 14 20:53:59 1986
Date-Received: Mon, 17-Mar-86 04:37:17 EST
References: <136@pyramid.UUCP> <5100024@ccvaxa> <2022@peora.UUCP>
Organization: Storage Technology Corp.  Louisville, CO
Lines: 99

> J. Eric Roskos @ Concurrent Computer Corp in Orlando says:

> One of the ongoing areas of research in microprogramming involves "vertical
> migration" -- analyzing sequences of code to determine things that can
> be migrated into the microcode, essentially to produce new instructions.
> From the RISC end you'd just go the other way; it's been argued that the
> cache does that "automatically," but it's hard to believe that
> in the long run, when the RISC approach has come to be seen as mundane,
> that someone doesn't start doing statistical analyses on RISC instruction
> sequences, and discovers that some sequences commonly occur,
> and makes new instructions out of those.
> But that's essentially identical to the vertical migration strategy.
> ... [a bit deleted] ...
> but I think the underlying approach is more or less the same.

One reason that vertical migration of frequent instruction sequences
"down" into a lower "interpretation layer" differs between microcoded machines
and RISC machines (as they are spoken about -- oops, I mean thought about)
today is that the RISC interpretation layer is hardware (today).
If you really need hardware to implement the instruction, then
even though an instruction sequence is common, the price/performance
tradeoff may still be the instruction sequence rather than
"interpret" a single instruction with hardware.

For example, HPs new "Spectrum" machines don't have a multiply instruction.
The reason, they say, is that it made a WHOLE lot of difference in the
amount of ALU hardware needed and the most common case,
a multiplier that is a small constant, can be effected cleverly
and quickly with the remainder of their hardware and instruction set.
Apparently a general multiply subroutine is "fast enough" for the
remaining cases.
An "Electronics" (March 3, 1986) article about Spectrum says that
"multiplications by both variables and constants are done
in an average of three or four cycles - faster than the 20
or so cycles many multiplications consume in other machines."

The following is motherhood, but it seems to bear repeated articulation.

A system design, ANY system design, is done for some particular
- problem (program) domain and
- customer (pocketbook) domain.
The "RISC ballet" (or is it a demolition derby?)
is an optimization problem with variables that include
performance range required, cost range, hardware technologies
(CPU stuff and memories in particular), compiler technology,
programs from the application domain(s) supported, and other
things I either don't know or have forgotten at the moment.

I've noticed that people seem to want to talk about the instruction
set without considering the programs to be run.
You can't.
A decision to leave out integer multiply, for instance, can't be made
only on the basis of what it does to the chip area or complexity.
HP measured a "whole lot" of "real" programs from people that use
their existing machines (the mag says "hundreds of customer jobs
for over 100 customers").  They designed a machine to support
THIS APPLICATION MIX with some particular cost/performance
requirements and not *necessarily* any other mix.
They simulated the effect of their design decisions on programs from people
who gave them money once and might be induced to do so sometime again.
Apparently they were able to decide that integer multiply wasn't
important enough to enough people to hurt their business.
For them, for now, leaving out multiply is a Good Thing.

HP's Spectrum series has support for coprocessors -- like an FPU.
My own speculation is that they found that a reasonable multiplier
for the core instruction set didn't satisfy the high-performance
applications so they had to use a coprocessor (like the FPU)
for that part of the problem domain anyway.

So are RISC designs going to be "general purpose enough"?
I think most designers have taken a broad cross section of problem
domains for their design base and the fundamental operations are similar
enough for most programs that this will hold true.
(Don't all programs spend their time branching and looping?).

John Mashey (mips!mash) comments that though MIPS hadn't targeted the
business program domain specifically that early experience is telling
him that their machine does a reasonable job with this kind of program.
I'm not surprised since the business programs don't do
anything remarkably different than text editors or compilers.
On the other hand, I suspect that the MIPS machine isn't
as good (i.e. effective and cost-effective) as a CRAY XMP
(at the same clock speed) at predicting tomorrow's weather for India.
Every computer is not the right solution for every problem.

As technology changes, useful hardware/software solutions will change.
Architectures aren't forever -- especially today when [vu]lsi is
moving so fast.
One trick to survival for computer makers will be to very carefully
choose an architecture so that it makes sense with today's technology
and still makes sense with the technology available in the medium future.

charlie


-- 
Charlie Price   {hao ihnp4 decvax}!stcvax!crp   (303) 673-5698
USnail:	Storage Technology Corp  -  MD 3T / Louisville, CO / 80028