Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!sdd.hp.com!spool.mu.edu!uunet!stanford.edu!agate!riacs!pioneer.arc.nasa.gov!lamaster From: lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) Newsgroups: comp.arch Subject: Re: RISC vs. CISC -- SPECmarks Message-ID: <1991May7.195913.27363@riacs.edu> Date: 7 May 91 19:59:13 GMT References: <1991Apr30.163153.18568@midway.uchicago.edu> <1991May2.162909.9165@news.arc.nasa.gov> <819@cadlab.sublink.ORG> <1991May7.052417.10606@leland.Stanford.EDU> Sender: news@riacs.edu Reply-To: lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) Organization: RIACS, NASA Ames Research Center Lines: 65 In article <1991May7.052417.10606@leland.Stanford.EDU>, dhinds@elaine18.Stanford.EDU (David Hinds) writes: |> In article <819@cadlab.sublink.ORG> martelli@cadlab.sublink.ORG (Alex Martelli) writes: |> >lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) writes: |> > ... : : etc etc: Even worse, code which was previously optimal for vector machines, and which |> >:was OK on a wide variety of other machines, is now pessimal for these machines. |> >Not really so new - I was optimizing codes for the cache in '87 for an IBM |> >3090 with VF... ok, there ARE problems (the curve of leading dimension of |> >array versus megaflops |> You're still a long way off. My *father* was optimizing Fortran matrix codes |> to exploit the cache on the IBM 370/195, in the (guess?) mid-70's. On that Both posters have essentially the same point, and this point is well taken. Machines with cache (and other locality-friendly) devices have been around a *long* time. Even the 360/67 got a boost from code rearrangement, due to the DAT box (Dynamic Address Translation == "TLB", sort of) overhead if you accessed arrays the wrong way. On the new RISCs, the effect is extremely strong. Combined with some of the vector-ish features of these machines, optimal codes can look like a hybrid of the cache and vector techniques, which makes them rather non-intuitive. I agree that this is nothing new. The major problem of all computer architects from the beginning is where to put the bandwidth. The new RISC-with-fast-cache machines have properties somewhat like a minicomputer with an attached array processor. If your problem is well suited to this, you can get phenomenal speedups very cheaply. If your problem does not have such locality, but is still vectorizable, a "vector supercomputer" architecture may be a better approach. What I am really arguing in favor of is a machine which combines both. There is no reason why you can't have a machine with both a superscalar CPU driven mainly off cache, and a vector load-store architecture that can access secondary memory directly. Then, you get the best of both worlds. The question is when will this be done on a microprocessor? In answer to the sometimes heard statement that "superscalar makes vector obsolete", the answer is that it *could*, just as a very fast Turing machine could also. In order to actually *do it*, however, the load/store architecture will have to be expanded considerably. No one has yet succeeded in getting that much concurrency going in a superscalar machine. But, I wouldn't argue that it couldn't be done. In fact, I would like to see it. In answer to the other criticism, that VLIW machines make vector obsolete, I agree. The Multiflow architecture could have potentially made "vector" machines obsolete. In fact, it is really too bad that they went out of business. Someone ought to be working on a single chip VLIW, if they aren't already. But, I haven't heard of anyone. In many ways, VLIW seems to be a simpler and more general form of "vectorization". |> -David Hinds |> dhinds@cb-iris.stanford.edu -- Hugh LaMaster, M/S 233-9, UUCP: ames!lamaster NASA Ames Research Center Internet: lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 With Good Mailer: lamaster@george.arc.nasa.gov Phone: 415/604-6117 #include