Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!cs.utexas.edu!uunet!stanford.edu!leland.Stanford.EDU!elaine18.Stanford.EDU!dhinds
From: dhinds@elaine18.Stanford.EDU (David Hinds)
Newsgroups: comp.arch
Subject: Re: RISC vs. CISC -- SPECmarks
Message-ID: <1991May7.052417.10606@leland.Stanford.EDU>
Date: 7 May 91 05:24:17 GMT
References: <1991Apr30.163153.18568@midway.uchicago.edu> <1991May2.162909.9165@news.arc.nasa.gov> <819@cadlab.sublink.ORG>
Sender: news@leland.Stanford.EDU (Mr News)
Organization: Stanford University - AIR
Lines: 25

In article <819@cadlab.sublink.ORG> martelli@cadlab.sublink.ORG (Alex Martelli) writes:
>lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) writes:
>	...
>:I have only limited experience with the new, fast-only-in-cache, machines,
>:but I have to say that the code you need to get optimum performance is
>:even more non-intuitive than that for the older vector architecture machines.
>:Even worse, code which was previously optimal for vector machines, and which
>:was OK on a wide variety of other machines, is now pessimal for these machines.
>
>Not really so new - I was optimizing codes for the cache in '87 for an IBM
>3090 with VF... ok, there ARE problems (the curve of leading dimension of
>array versus megaflops bounces up and down wildly and unpredictably for many,
>many 'normal' patterns of memory access - FAR from intuitive!)...

You're still a long way off.  My *father* was optimizing Fortran matrix codes
to exploit the cache on the IBM 370/195, in the (guess?) mid-70's.  On that
machine, correct loop ordering and blocking to fit the cache produced like a
20-fold speed improvement on matrix multiply, without any need for assembly
language.  This was quite a fast machine for its time, as I understand.  But
what seemed obvious at the time seems to have been rediscovered with great
fanfare several times since then.  The IBM RS-6000 also has a lot in common
with the 195 architecture, apparently.

 -David Hinds
  dhinds@cb-iris.stanford.edu