Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!cs.utexas.edu!uunet!stanford.edu!leland.Stanford.EDU!elaine18.Stanford.EDU!dhinds From: dhinds@elaine18.Stanford.EDU (David Hinds) Newsgroups: comp.arch Subject: Re: RISC vs. CISC -- SPECmarks Message-ID: <1991May7.052417.10606@leland.Stanford.EDU> Date: 7 May 91 05:24:17 GMT References: <1991Apr30.163153.18568@midway.uchicago.edu> <1991May2.162909.9165@news.arc.nasa.gov> <819@cadlab.sublink.ORG> Sender: news@leland.Stanford.EDU (Mr News) Organization: Stanford University - AIR Lines: 25 In article <819@cadlab.sublink.ORG> martelli@cadlab.sublink.ORG (Alex Martelli) writes: >lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) writes: > ... >:I have only limited experience with the new, fast-only-in-cache, machines, >:but I have to say that the code you need to get optimum performance is >:even more non-intuitive than that for the older vector architecture machines. >:Even worse, code which was previously optimal for vector machines, and which >:was OK on a wide variety of other machines, is now pessimal for these machines. > >Not really so new - I was optimizing codes for the cache in '87 for an IBM >3090 with VF... ok, there ARE problems (the curve of leading dimension of >array versus megaflops bounces up and down wildly and unpredictably for many, >many 'normal' patterns of memory access - FAR from intuitive!)... You're still a long way off. My *father* was optimizing Fortran matrix codes to exploit the cache on the IBM 370/195, in the (guess?) mid-70's. On that machine, correct loop ordering and blocking to fit the cache produced like a 20-fold speed improvement on matrix multiply, without any need for assembly language. This was quite a fast machine for its time, as I understand. But what seemed obvious at the time seems to have been rediscovered with great fanfare several times since then. The IBM RS-6000 also has a lot in common with the 195 architecture, apparently. -David Hinds dhinds@cb-iris.stanford.edu