Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site peora.UUCP Path: utzoo!watmath!clyde!cbosgd!ihnp4!houxm!hjuxa!petsd!peora!jer From: jer@peora.UUCP (J. Eric Roskos) Newsgroups: net.arch Subject: Re: RISCs, caches, and vertical migration Message-ID: <2033@peora.UUCP> Date: Tue, 18-Mar-86 09:08:33 EST Article-I.D.: peora.2033 Posted: Tue Mar 18 09:08:33 1986 Date-Received: Wed, 19-Mar-86 05:23:52 EST References: <136@pyramid.UUCP> <5100024@ccvaxa> <2022@peora.UUCP> <110@stcvax.UUCP> Organization: Concurrent Computer Corporation, Orlando, Fl Lines: 92 Charlie Price at Storage Technology* writes, in response to my suggesting that there are analogies between vertical migration and the "optimizing CISC" description of RISC cache: > One reason that vertical migration of frequent instruction sequences > "down" into a lower "interpretation layer" differs between microcoded > machines and RISC machines (as they are spoken about -- oops, I mean > thought about) today is that the RISC interpretation layer is hardware > (today). While I do agree with the remainder of the posting, the above suggests I wasn't clear on what I meant in making the analogy. The goal of vertical migration is to identify frequently-occurring sequences of operations, and move them into the microcode. The reason for moving them into the microcode is the assumption that the microcode is in a small, fast, but costly memory, and thus can be executed faster. In general, there's also the assumption that you can do more in microcode in parallel, since horizontal microprograms are really commands to control points in the machine, rather than conventional, familiar, sequential instructions; but how applicable this is depends on how "horizontal" your microinstruction is. The former assumption is the one used recently in here to claim that a RISC is really an "optimizing CISC". There is some moderate truth to that, too, I think, although I have certain doubts at present. A CISC machine that has used the vertical migration techniques to give a good CISC instruction set for a particular program (and note that indeed there may be different instruction sets for different *programs*; several years ago when I was involved briefly in research in this area under R. I. Winner we were working on implementing dynamic switching of control store under Unix in order to support this approach) consists essentially of subroutine calls to RISC-like subroutines (actually subroutines in the microprogram control store); the subroutine calls are made out of a slower memory external to the CPU -- generally the CPU treats this memory like a user-level machine would treat a disk, and has to explicitly address and fetch data out of the memory and possibly wait for it to arrive -- and these calls are the CISC instructions. Now, in order for a RISC machine to work this way, i.e. in order for the instructions in the cache to be somehow analogous to CISC instructions in control store, there has to be both a spatial locality and a locality of reference for the frequently-used instruction sequence. At the time I made the original posting, I debated somewhat over whether to raise this point, and decided not to, because it would be somewhat difficult to determine exactly *how* similar this makes the two kinds of machine. But the basic problem is that, if you say that the RISC machine is an "optimizing CISC" because it keeps frequently executed instruction sequences in cache, you have to have a compiler that recognizes these frequently executed sequences, and puts them in one place (say, a subroutine that is repeatedly invoked). For example, having an identical sequence randomly distributed through memory wouldn't work, because the cache works spatially; it doesn't know anything about the instruction sequences per se, so for example you might be fetching in the same sequence multiple times just because it was spread through memory and thus not in the cache when you next needed it.** On the other hand, if you put the sequence in a subroutine, and aligned the subroutine such that it all fitted into cache and you could keep it in there while fetching instructions from somewhere else that invoked that subrotine, than each time the subroutine was called it would already be in cache, and the operation would be analogous to a CISC with vertical migration of the sequence that's in the subroutine into microcode. I'm not sure it's safe to say that makes the RISC with cache an "optimizing CISC", though, since it requires as much information about the original program as would be required to do the vertical migration in the first place. Which was why I said originally that I thought the two areas of research probably had a lot in common and probably would tend to converge. However, the above is a fairly simplified description, since there are a lot of questions in my own mind on how comparable the two approaches actually are that would be hard to address productively in this sort of discussion. -------- * A company that makes amazing tape drives! We have one of them here. ** Note that I am thinking here of the more general case in which you just have a repeatedly occurring sequence you want to put in a faster place; this might happen, for example, in a large program which has a large loop that is repeated many times, such that the loop is too big to fit into the cache at once, but contains sequences of operations (maybe interspersed with various other differing operations of varied lengths) that do recur repeatedly. I didn't mention the obvious case in which you have a loop that can fit entirely in cache; and also I wasn't considering the benefits of cache in a machine with a very wide bus, where you fetch 4 or 8 words into the cache all at the same time because it costs the same as fetching 1 word. Those are all special cases, and have been covered a lot in here already, and it's been shown that they do provide significant benefits themselves. -- E. Roskos