Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!cbosgd!ihnp4!houxm!whuxl!whuxlm!akgua!gatech!seismo!hao!nbires!stcvax!crp From: crp@stcvax.UUCP Newsgroups: net.arch Subject: Re: RISCs, caches, and vertical migration Message-ID: <110@stcvax.UUCP> Date: Fri, 14-Mar-86 20:53:59 EST Article-I.D.: stcvax.110 Posted: Fri Mar 14 20:53:59 1986 Date-Received: Mon, 17-Mar-86 04:37:17 EST References: <136@pyramid.UUCP> <5100024@ccvaxa> <2022@peora.UUCP> Organization: Storage Technology Corp. Louisville, CO Lines: 99 > J. Eric Roskos @ Concurrent Computer Corp in Orlando says: > One of the ongoing areas of research in microprogramming involves "vertical > migration" -- analyzing sequences of code to determine things that can > be migrated into the microcode, essentially to produce new instructions. > From the RISC end you'd just go the other way; it's been argued that the > cache does that "automatically," but it's hard to believe that > in the long run, when the RISC approach has come to be seen as mundane, > that someone doesn't start doing statistical analyses on RISC instruction > sequences, and discovers that some sequences commonly occur, > and makes new instructions out of those. > But that's essentially identical to the vertical migration strategy. > ... [a bit deleted] ... > but I think the underlying approach is more or less the same. One reason that vertical migration of frequent instruction sequences "down" into a lower "interpretation layer" differs between microcoded machines and RISC machines (as they are spoken about -- oops, I mean thought about) today is that the RISC interpretation layer is hardware (today). If you really need hardware to implement the instruction, then even though an instruction sequence is common, the price/performance tradeoff may still be the instruction sequence rather than "interpret" a single instruction with hardware. For example, HPs new "Spectrum" machines don't have a multiply instruction. The reason, they say, is that it made a WHOLE lot of difference in the amount of ALU hardware needed and the most common case, a multiplier that is a small constant, can be effected cleverly and quickly with the remainder of their hardware and instruction set. Apparently a general multiply subroutine is "fast enough" for the remaining cases. An "Electronics" (March 3, 1986) article about Spectrum says that "multiplications by both variables and constants are done in an average of three or four cycles - faster than the 20 or so cycles many multiplications consume in other machines." The following is motherhood, but it seems to bear repeated articulation. A system design, ANY system design, is done for some particular - problem (program) domain and - customer (pocketbook) domain. The "RISC ballet" (or is it a demolition derby?) is an optimization problem with variables that include performance range required, cost range, hardware technologies (CPU stuff and memories in particular), compiler technology, programs from the application domain(s) supported, and other things I either don't know or have forgotten at the moment. I've noticed that people seem to want to talk about the instruction set without considering the programs to be run. You can't. A decision to leave out integer multiply, for instance, can't be made only on the basis of what it does to the chip area or complexity. HP measured a "whole lot" of "real" programs from people that use their existing machines (the mag says "hundreds of customer jobs for over 100 customers"). They designed a machine to support THIS APPLICATION MIX with some particular cost/performance requirements and not *necessarily* any other mix. They simulated the effect of their design decisions on programs from people who gave them money once and might be induced to do so sometime again. Apparently they were able to decide that integer multiply wasn't important enough to enough people to hurt their business. For them, for now, leaving out multiply is a Good Thing. HP's Spectrum series has support for coprocessors -- like an FPU. My own speculation is that they found that a reasonable multiplier for the core instruction set didn't satisfy the high-performance applications so they had to use a coprocessor (like the FPU) for that part of the problem domain anyway. So are RISC designs going to be "general purpose enough"? I think most designers have taken a broad cross section of problem domains for their design base and the fundamental operations are similar enough for most programs that this will hold true. (Don't all programs spend their time branching and looping?). John Mashey (mips!mash) comments that though MIPS hadn't targeted the business program domain specifically that early experience is telling him that their machine does a reasonable job with this kind of program. I'm not surprised since the business programs don't do anything remarkably different than text editors or compilers. On the other hand, I suspect that the MIPS machine isn't as good (i.e. effective and cost-effective) as a CRAY XMP (at the same clock speed) at predicting tomorrow's weather for India. Every computer is not the right solution for every problem. As technology changes, useful hardware/software solutions will change. Architectures aren't forever -- especially today when [vu]lsi is moving so fast. One trick to survival for computer makers will be to very carefully choose an architecture so that it makes sense with today's technology and still makes sense with the technology available in the medium future. charlie -- Charlie Price {hao ihnp4 decvax}!stcvax!crp (303) 673-5698 USnail: Storage Technology Corp - MD 3T / Louisville, CO / 80028