Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!newstop!exodus!rbbb.Eng.Sun.COM!chased From: chased@rbbb.Eng.Sun.COM (David Chase) Newsgroups: comp.arch Subject: What the compiler won't do you for you Message-ID: <8658@exodus.Eng.Sun.COM> Date: 26 Feb 91 19:40:01 GMT References: <10244@dog.ee.lbl.gov> <1991Feb25.203629.5059@linus.mitre.org> <10278@dog.ee.lbl.gov> <22605:Feb2608:04:5391@kramden.acf.nyu.edu> Sender: news@exodus.Eng.Sun.COM Organization: Sun Microsystems, Mt. View, Ca. Lines: 74 (Even though I'm replying to Dan, I'll be polite. Truth is stranger than fiction, eh?) brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >Observation 2: Whoever writes the optimizer for FUBAR---let's call this >guy Natas---*could* make every single FUBAR instruction available from >Z. All he has to do is make sure that there's *some* language construct, >perhaps ridiculously convoluted, that will compile to each instruction. >Observation 3: Natas rarely does this. I have yet to see any compiler >understand any portable expression of Hamming weights, even though this >wouldn't be very difficult. Even in the occasional case where the >compiler writer shows some evidence of having used the assembly language >he's compiling for, he rarely tells programmers what the optimizer can >and can't do. >Let's assume that Natas is not such a devil, and manages not only to >give his optimizer some way of using FUBAR's CXi operation and some way >of using FUBAR's DREM operation, but also to *document* (gasp) the >corresponding expressions in Z. Now Joe can write portable Z code that >will run fast on FUBAR, taking advantage of the chip's instructions. All >he has to do is follow Natas's instructions. There's one problem with this: finite resources (if you or Herman Rubin wants to pony up a dump-truck full of cash, maybe we can talk, but I'll bet your resources are finite, too). I think you'll agree that no compiler writer should spend time optimizing for the machine-specific case until most of the optimization has been done for the portable (standard-conforming) case. So, after we've taken care of reduction in strength, global value numbering, constant propagation, redundancy elimination, loop invariant code motion, instruction selection, register allocation, scheduling, loop unrolling, loop straightening, peephole improvements, software pipelining, linear function test replacement, loop fusion, stripmining, blocking, dead code elimination, tail call elimination, leaf routine optimization -- oops, better algorithms appeared, so we have to reimplement a couple of those -- oops, new machine, time to fool around with the scheduling and instruction selection some more -- oops, time to do some fancy code placement to help out the cache -- oops, time to do interprocedural optimization. I think you get the picture. Always, optimization efforts have to be directed towards those things that will make the largest number of present and future customers happy. When all the work is done for C, Fortran, Pascal, Modula, and C++, is it better to expose non-portable machine-specific optimizations to the programmer, or should we look at extending the optimizer to be useful to Lisp, Ada, Cobol, RPG-III, Prolog, and Jovial? Maybe we'd make people happier if the optimizer ran twice as fast, or in half the memory. Maybe we should make use of some of the dataflow algorithms to provide debugging feedback to the user ("there's no exit from this loop; perhaps it won't terminate?") Another difficulty with the scheme that you describe is that you really don't want people to be writing their code in strange little idioms because that will engage some magic widget in the optimizer. It's portable, but weird. That doesn't help readability or debuggability, and it may not be portable into the future. If someone rewrites the optimizer, the last thing they want to do is support weird little hacks like that. If you must (and some people must), then use something based on subroutine calls. That has the advantage that (1) if no machine support exists, it is easy to plug in portable code and (2) machine support often exists for assembly-language implementations (for instance, say "man inline" on a Sun). Of course, there are some things that "ought" to be recognized because they are portable, but it still isn't clear that they are needed, or worth the cost. For instance, on the Sparc we *could* spill register %i7 (return PC) and use that register for other purposes in the main body of a subroutine. Fine, except that we'd break all the debuggers, including adb. That adds big costs to the final debugging that goes on before a product (ours, or some software vendor's) product is shipped. Probably not worth it. David Chase Sun Brought to you by Super Global Mega Corp .com