Xref: utzoo comp.lang.fortran:5104 comp.unix.cray:291 comp.sys.super:310
Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!wuarchive!uwm.edu!convex.csd.uwm.edu!bruno
From: bruno@convex.csd.uwm.edu (Bruno Wolff III)
Newsgroups: comp.lang.fortran,comp.unix.cray,comp.sys.super
Subject: Re: Fortran optimization
Message-ID: <10775@uwm.edu>
Date: 4 Apr 91 21:31:02 GMT
References: <1991Apr3.062644.23436@eagle.lerc.nasa.gov>
Sender: news@uwm.edu
Followup-To: comp.lang.fortran
Organization: University of Wisconsin-Milwaukee Computing Services Division
Lines: 15

In article <1991Apr3.062644.23436@eagle.lerc.nasa.gov> fsjohnv@alliant1.lerc.nasa.gov (Richard Mulac) writes:
%
%   While optimizing a code last year I ran across a situation which on the
%surface appears to contradict some of the basic rules one abides by when
%trying to write efficient Fortran code.  I was trying to merge several DO
%loop blocks to eliminate the overhead associated with redundant index
%calculations and also to reduce memory access.  The code was written to
%be run on a CRAY Y-MP, compiling with cft77 4.0.1.6.  What I found can
%be described using the following program TEST, which calls subroutines
%JOINED and SPLIT (It's a few hundred lines long, but the majority of the
%code is redundant).

Possibly this is caused by the loop not fitting in the instruction cache.
It can be more efficient to have several small loops each of which entirely
fits in the instruction cache instead of a larger loop which doesn't.