Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!apple!ames!amdahl!pacbell!lll-winken!vette!brooks
From: brooks@vette.llnl.gov (Eugene Brooks)
Newsgroups: gnu.gcc
Subject: Re: Instruction Scheduling and Branch Scheduling
Message-ID: <27050@lll-winken.LLNL.GOV>
Date: 17 Jun 89 18:52:21 GMT
References: <8906170506.AA29525@yahi>
Sender: usenet@lll-winken.LLNL.GOV
Reply-To: brooks@maddog.llnl.gov (Eugene Brooks)
Distribution: gnu
Organization: Lawrence Livermore National Laboratory
Lines: 37

In article <8906170506.AA29525@yahi> tiemann@lurch.stanford.edu writes:
>I don't know if this is the right place to post technical information
>about GNU CC, but here goes...
WARNING: THIS IS A PURELY TECHNICAL RESPONSE TO A PURELY TECHNICAL POSTING
ALL YOU POLITICAL TYPES HIT THE KILL KEY NOW!  I APOLOGIZE IN ADVANCE FOR
POSTING THIS TO gnu.gcc.


I snarfed Tiemann's paper across the net and read it, I hope that this paper
appears as part of the documentation of the GCC distribution at some point.
Perhaps one could start collecting and stuffing these things in a subdirectory.
At the risk of suggesting that net bandwidth be tied up with such technical
stuff, I suggest that Tiemann actually post his paper to gnu.gcc for those
without ftp access.

Instruction scheduling is a very machine specific operation because it
depends so strongly on the exact characteristics of the target machine.
For the recent pipeline RISC chips with short pipes, Tiemann's results would
appear to apply well.  However, for supercomputers and next years RISC chips
where the pipeline delays for loads and floating point operations
are a bit longer, some of the scheduling tricks which were tried
and found not to work will have a much different performance character.

If would appear that any attempt to do a "machine independent" implementation
of instruction scheduling might require a couple dials that could be set
from within machine dependent files, say output.c, which would control the
action of the scheduling algorithms.  Tiemann's first cut would work well
on a Cyber 205 which has really long pipeline delays and lots of registers.
Tiemann's last cut clearly works well on the SPARC and will probably work
well on the MIPS and other chips with short pipes.  The pipeline delays
of the 88K are longer, for instance three clocks for a load and 6 for
floating point, and the first cut might have different performance results
for this chip.


brooks@maddog.llnl.gov, brooks@maddog.uucp