Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site peora.UUCP
Path: utzoo!watmath!clyde!cbosgd!ihnp4!houxm!hjuxa!petsd!peora!jer
From: jer@peora.UUCP (J. Eric Roskos)
Newsgroups: net.arch
Subject: Re: RISCs, caches, and vertical migration
Message-ID: <2033@peora.UUCP>
Date: Tue, 18-Mar-86 09:08:33 EST
Article-I.D.: peora.2033
Posted: Tue Mar 18 09:08:33 1986
Date-Received: Wed, 19-Mar-86 05:23:52 EST
References: <136@pyramid.UUCP> <5100024@ccvaxa> <2022@peora.UUCP> <110@stcvax.UUCP>
Organization: Concurrent Computer Corporation, Orlando, Fl
Lines: 92

Charlie Price at Storage Technology* writes, in response to my suggesting
that there are analogies between vertical migration and the "optimizing
CISC" description of RISC cache:

> One reason that vertical migration of frequent instruction sequences
> "down" into a lower "interpretation layer" differs between microcoded
> machines and RISC machines (as they are spoken about -- oops, I mean
> thought about) today is that the RISC interpretation layer is hardware
> (today).

While I do agree with the remainder of the posting, the above suggests I
wasn't clear on what I meant in making the analogy.

The goal of vertical migration is to identify frequently-occurring
sequences of operations, and move them into the microcode.  The reason for
moving them into the microcode is the assumption that the microcode is in
a small, fast, but costly memory, and thus can be executed faster.  In
general, there's also the assumption that you can do more in microcode in
parallel, since horizontal microprograms are really commands to control
points in the machine, rather than conventional, familiar, sequential
instructions; but how applicable this is depends on how "horizontal" your
microinstruction is.

The former assumption is the one used recently in here to claim that a
RISC is really an "optimizing CISC".  There is some moderate truth to that,
too, I think, although I have certain doubts at present.

A CISC machine that has used the vertical migration techniques to give a
good CISC instruction set for a particular program (and note that indeed
there may be different instruction sets for different *programs*; several
years ago when I was involved briefly in research in this area under R.
I. Winner we were working on implementing dynamic switching of control
store under Unix in order to support this approach) consists essentially
of subroutine calls to RISC-like subroutines (actually subroutines in
the microprogram control store); the subroutine calls are made out of a
slower memory external to the CPU -- generally the CPU treats this memory
like a user-level machine would treat a disk, and has to explicitly
address and fetch data out of the memory and possibly wait for it to
arrive -- and these calls are the CISC instructions.

Now, in order for a RISC machine to work this way, i.e. in order for the
instructions in the cache to be somehow analogous to CISC instructions in
control store, there has to be both a spatial locality and a locality
of reference for the frequently-used instruction sequence.  At the time I
made the original posting, I debated somewhat over whether to raise this
point, and decided not to, because it would be somewhat difficult to
determine exactly *how* similar this makes the two kinds of machine.

But the basic problem is that, if you say that the RISC machine is an
"optimizing CISC" because it keeps frequently executed instruction
sequences in cache, you have to have a compiler that recognizes these
frequently executed sequences, and puts them in one place (say, a
subroutine that is repeatedly invoked).  For example, having an identical
sequence randomly distributed through memory wouldn't work, because the
cache works spatially; it doesn't know anything about the instruction
sequences per se, so for example you might be fetching in the same sequence
multiple times just because it was spread through memory and thus not in
the cache when you next needed it.** On the other hand, if you put the
sequence in a subroutine, and aligned the subroutine such that it all
fitted into cache and you could keep it in there while fetching
instructions from somewhere else that invoked that subrotine, than each
time the subroutine was called it would already be in cache, and the
operation would be analogous to a CISC with vertical migration of the
sequence that's in the subroutine into microcode.  I'm not sure it's safe
to say that makes the RISC with cache an "optimizing CISC", though, since
it requires as much information about the original program as would be
required to do the vertical migration in the first place.

Which was why I said originally that I thought the two areas of research
probably had a lot in common and probably would tend to converge.  However,
the above is a fairly simplified description, since there are a lot of
questions in my own mind on how comparable the two approaches actually are
that would be hard to address productively in this sort of discussion.

--------
* A company that makes amazing tape drives!  We have one of them here.

** Note that I am thinking here of the more general case in which you just
have a repeatedly occurring sequence you want to put in a faster place;
this might happen, for example, in a large program which has a large loop
that is repeated many times, such that the loop is too big to fit into
the cache at once, but contains sequences of operations (maybe interspersed
with various other differing operations of varied lengths) that do recur
repeatedly.  I didn't mention the obvious case in which you have a loop
that can fit entirely in cache; and also I wasn't considering the
benefits of cache in a machine with a very wide bus, where you fetch 4 or
8 words into the cache all at the same time because it costs the same as
fetching 1 word.  Those are all special cases, and have been covered a lot
in here already, and it's been shown that they do provide significant
benefits themselves.
-- 
E. Roskos