Path: utzoo!utgpu!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!rutgers!deimos!uxc!uxc.cso.uiuc.edu!mcdurb!aglew From: aglew@mcdurb.Urbana.Gould.COM Newsgroups: comp.sys.m68k Subject: Re: addq.w #n,sp and a pop quiz Message-ID: <27100003@mcdurb> Date: 16 Dec 88 19:46:00 GMT References: <4350@Portia.Stanford.EDU> Lines: 77 Nf-ID: #R:Portia.Stanford.EDU:4350:mcdurb:27100003:000:2326 Nf-From: mcdurb.Urbana.Gould.COM!aglew Dec 16 13:46:00 1988 Ref: Superoptimizer -- a look at the smallest program Henry Massalin ASPLOS II Provides a number of code sequences that avoid branches. Branch avoidance becomes important on high performance processors. signum(x) (x in d0) add.l d0,d0 subx.l d1,d1 negx.l d0 addx.l d1,d1 (signum x in d1) abs(x) (x in d0) move.l d0,d1 add.l d1,d1 subx.l d1,d1 eor.l d1,d0 sub.l d1,d0 (abs(x) in d0) Longer but no jumps. Unfortunately, dependencies between nearly every instruction, but a good code scheduler may be able to interleave other code here. We have a good code scheduler, don't we, Dave? :-) Max(X,Y) (d0=X,d1=Y) sub.l d1,d0 subx.l d2,d2 or.l d2,d0 addx.l d1,d0 (d0 = max(X,Y) Min is similar. Simultaneous max/min MaxMin(X,Y) (d0=X, d1=Y) sub.l d1,d0 subx.l d2,d2 and.l d0,d2 eor.l d2,d0 add.l d1,d0 add.l d2,d1 (d0 = max, d1 = min) Massalin goes on to present some BCD code fragments, multiplication by constants, etc. I don't think that I have done an injustice to Massalin's paper by presenting these fragments -- many of them were known before his paper, by people like me who like playing with branch avoiding code sequences. Some were already well known tricks. Massalin's contribution is in having written a program that finds the best such sequence, with reasonable certainty. Ie. he has automated the search process that hackers used to do by inspiration. Superoptimizer is probably not the sort of thing that you would put into a production compiler, usually - most code simply does not benefit that much from it, plus, it is extremely slow. But, something like superoptimizer should almost definitely be run by a compiler group that is generating code fragments, for code generation or libraries, every time the machine parameters change (like, when a new chip comes out). Unless your machine is simple enough that the best code sequence is obvious by inspection. Andy "Krazy" Glew aglew@urbana.mcd.mot.com uunet!uiucdcs!mcdurb!aglew Motorola Microcomputer Division, Champaign-Urbana Design Center 1101 E. University, Urbana, Illinois 61801, USA. My opinions are my own, and are not the opinions of my employer, or any other organisation. I indicate my company only so that the reader may account for any possible bias I may have towards our products.