Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!apple!ames!amdahl!pacbell!lll-winken!vette!brooks From: brooks@vette.llnl.gov (Eugene Brooks) Newsgroups: gnu.gcc Subject: Re: Instruction Scheduling and Branch Scheduling Message-ID: <27050@lll-winken.LLNL.GOV> Date: 17 Jun 89 18:52:21 GMT References: <8906170506.AA29525@yahi> Sender: usenet@lll-winken.LLNL.GOV Reply-To: brooks@maddog.llnl.gov (Eugene Brooks) Distribution: gnu Organization: Lawrence Livermore National Laboratory Lines: 37 In article <8906170506.AA29525@yahi> tiemann@lurch.stanford.edu writes: >I don't know if this is the right place to post technical information >about GNU CC, but here goes... WARNING: THIS IS A PURELY TECHNICAL RESPONSE TO A PURELY TECHNICAL POSTING ALL YOU POLITICAL TYPES HIT THE KILL KEY NOW! I APOLOGIZE IN ADVANCE FOR POSTING THIS TO gnu.gcc. I snarfed Tiemann's paper across the net and read it, I hope that this paper appears as part of the documentation of the GCC distribution at some point. Perhaps one could start collecting and stuffing these things in a subdirectory. At the risk of suggesting that net bandwidth be tied up with such technical stuff, I suggest that Tiemann actually post his paper to gnu.gcc for those without ftp access. Instruction scheduling is a very machine specific operation because it depends so strongly on the exact characteristics of the target machine. For the recent pipeline RISC chips with short pipes, Tiemann's results would appear to apply well. However, for supercomputers and next years RISC chips where the pipeline delays for loads and floating point operations are a bit longer, some of the scheduling tricks which were tried and found not to work will have a much different performance character. If would appear that any attempt to do a "machine independent" implementation of instruction scheduling might require a couple dials that could be set from within machine dependent files, say output.c, which would control the action of the scheduling algorithms. Tiemann's first cut would work well on a Cyber 205 which has really long pipeline delays and lots of registers. Tiemann's last cut clearly works well on the SPARC and will probably work well on the MIPS and other chips with short pipes. The pipeline delays of the 88K are longer, for instance three clocks for a load and 6 for floating point, and the first cut might have different performance results for this chip. brooks@maddog.llnl.gov, brooks@maddog.uucp