Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uwm.edu!uwvax!rang From: rang@cs.wisc.edu (Anton Rang) Newsgroups: comp.lang.c Subject: Loop optimization (was Re: Array bounds checking with C????) Summary: This is a well-known optimization from the FORTRAN world Message-ID: Date: 7 Sep 90 16:54:43 GMT References: <1589@redsox.bsw.com> <1990Sep6.203031.29254@laguna.ccsf.caltech.edu> Sender: news@spool.cs.wisc.edu Organization: UW-Madison CS department Lines: 59 In-reply-to: jimmy@tybalt.caltech.edu's message of 6 Sep 90 20:30:31 GMT In article <1990Sep6.203031.29254@laguna.ccsf.caltech.edu>, jimmy@tybalt.caltech.edu (Jimmy Hu) writes: >I would expect the two fragments to generate different codes. [ ... ] >I don't think that the compiler will "convert" or "optimize" the >second to the first or vice versa. It should, if the first will run faster--and on most modern machines, it will. On machines with autoincrement, the pointer version becomes something like: loop: clr.b (r0)+ ; Clear a byte, autoincrement R0 cmp.l r0,r1 ; Are we at the end of the array yet? bleq loop ; No, keep going The second loop, if compiled naively, generates code more like loop: clr.b (r0)[d0.l] ; Clear a byte using indexed addressing inc.l d0 ; Increment i cmp.l d0,d1 ; Done yet? bleq loop ; No, not yet This is likely to run slower on nearly all machines. (The above code is intended to reflect a typical CISC machine, say a 68020 or VAX.) >where is it going to get a pointer from? Create one? Yes, it should. >How will i become ARRAYSIZE after exiting the loop? If i is used after the loop, there should be an instruction *after* the loop which moves ARRAYSIZE into i. If i is used inside the loop for something, the compiler has to decide between using indexed addressing for the array, or maintaining an address register synchronized to i (if i isn't assigned to within the loop, and there are extra registers available, this is fairly easy). >I think that the first fragment will run faster on most machines, >since the address (array+i) doesn't have to be computed at each step. Exactly; this is why we want the compiler to convert the array version into the pointer version. That way, programmers can write it more clearly (or at least, perhaps more clearly to them), and it will still run at the same speed. (On some machines, though I don't know of any, it could even be that the pointer version would be slower, and the compiler should then convert case (1) into (2).) This particular optimization is well-known, and has been used in FORTRAN compilers for some time. It's called "induction variable elimination"; check out pp. 466-471 of the original Dragon book for an explanation and (simplistic) algorithms. (It's easier in FORTRAN since aliasing is rarer.) Anton +---------------------------+------------------+-------------+ | Anton Rang (grad student) | rang@cs.wisc.edu | UW--Madison | +---------------------------+------------------+-------------+