Path: utzoo!mnetor!tmsoft!torsqnt!news-server.csri.toronto.edu!clyde.concordia.ca!thunder.mcrcim.mcgill.edu!snorkelwacker.mit.edu!hsdndev!spdcc!ima!iecc!compilers-sender From: henry@zoo.toronto.edu (Henry Spencer) Newsgroups: comp.compilers Subject: Re: Vax span-dependent jumps Keywords: optimize, code, assembler Message-ID: <1990Dec30.005507.7842@zoo.toronto.edu> Date: 30 Dec 90 00:55:07 GMT References: <9012111913.AA07310@cs-sun-fsa.cpsc.ucalgary.ca> <14692@goofy.megatest.UUCP> <28765@mimsy.umd.edu> Sender: compilers-sender@iecc.cambridge.ma.us Reply-To: henry@zoo.toronto.edu (Henry Spencer) Organization: U of Toronto Zoology Lines: 38 Approved: compilers@iecc.cambridge.ma.us In article <28765@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes: >a special optimization, the assembler will sometimes% use something called >`branch tunneling'... Properly known as branch chaining; it's been around since at least the Bliss-11 compiler. It wins on code size but typically loses on speed, especially on modern machines where extra pipeline breaks really hurt. Our moderator writes: >[... I wrote the original AIX assembler for the RT PC, another machine with >span-dependent jumps, by mutating the System III Vax assembler. The code to >do span-dependent jumps was pretty straightforward, in pass 1 it assumed >they'd all be the long form and made a table of all of them, then between >passes 1 and 2 iterated over that table looking for branches that could be >converted to the short form... General comment: this is theoretically the wrong approach, since there are sequences where the decisions interact so that branch X can be short only if branch Y is also short, and a one-at-a-time algorithm that starts with all of them long won't shorten them. In practice, almost all branches are short and *any* reasonable algorithm will work well enough. Things like iterative scanning of tables are overkill; a simple FIFO buffer in the output stage, with branches made long if if they are pushed out the end before their target is known, will get almost all of them. -- Henry Spencer at U of Toronto Zoology, henry@zoo.toronto.edu utzoo!henry [It's true, the scheme that I used in the AIX assembler won't produce optimal results, but since the code already was there and worked, I went with the flow. On the other hand, perfect branch optimization is NP complete so barring major positive news in the P=NP department, any perfect method is very slow. A paper by Tom Szymanski in the CACM ten or so years ago tells more than you'd ever want to know about the topic, I can dig up the reference if there's interest. -John] -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.