Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uwm.edu!uwvax!rang
From: rang@cs.wisc.edu (Anton Rang)
Newsgroups: comp.lang.c
Subject: Loop optimization (was Re: Array bounds checking with C????)
Summary: This is a well-known optimization from the FORTRAN world
Message-ID: <RANG.90Sep7115443@jaguar.cs.wisc.edu>
Date: 7 Sep 90 16:54:43 GMT
References: <MEISSNER.90Aug31102636@osf.osf.org>
	<DAVEG.90Aug31224630@near.cs.caltech.edu> <1589@redsox.bsw.com>
	<1990Sep6.203031.29254@laguna.ccsf.caltech.edu>
Sender: news@spool.cs.wisc.edu
Organization: UW-Madison CS department
Lines: 59
In-reply-to: jimmy@tybalt.caltech.edu's message of 6 Sep 90 20:30:31 GMT

In article <1990Sep6.203031.29254@laguna.ccsf.caltech.edu>,
  jimmy@tybalt.caltech.edu (Jimmy Hu) writes:
>I would expect the two fragments to generate different codes. [ ... ]
>I don't think that the compiler will "convert" or "optimize" the
>second to the first or vice versa.

  It should, if the first will run faster--and on most modern
machines, it will.  On machines with autoincrement, the pointer
version becomes something like:

loop:	clr.b	(r0)+		; Clear a byte, autoincrement R0
	cmp.l	r0,r1		; Are we at the end of the array yet?
	bleq	loop		; No, keep going

  The second loop, if compiled naively, generates code more like

loop:	clr.b	(r0)[d0.l]	; Clear a byte using indexed addressing
	inc.l	d0		; Increment i
	cmp.l	d0,d1		; Done yet?
	bleq	loop		; No, not yet

  This is likely to run slower on nearly all machines.  (The above
code is intended to reflect a typical CISC machine, say a 68020 or
VAX.)

>where is it going to get a pointer from? Create one?

  Yes, it should.

>How will i become ARRAYSIZE after exiting the loop?

  If i is used after the loop, there should be an instruction *after*
the loop which moves ARRAYSIZE into i.  If i is used inside the loop
for something, the compiler has to decide between using indexed
addressing for the array, or maintaining an address register
synchronized to i (if i isn't assigned to within the loop, and there
are extra registers available, this is fairly easy).

>I think that the first fragment will run faster on most machines,
>since the address (array+i) doesn't have to be computed at each step.

  Exactly; this is why we want the compiler to convert the array
version into the pointer version.  That way, programmers can write it
more clearly (or at least, perhaps more clearly to them), and it will
still run at the same speed.  (On some machines, though I don't know
of any, it could even be that the pointer version would be slower, and
the compiler should then convert case (1) into (2).)

  This particular optimization is well-known, and has been used in
FORTRAN compilers for some time.  It's called "induction variable
elimination"; check out pp. 466-471 of the original Dragon book for an
explanation and (simplistic) algorithms.  (It's easier in FORTRAN
since aliasing is rarer.)

	Anton
   
+---------------------------+------------------+-------------+
| Anton Rang (grad student) | rang@cs.wisc.edu | UW--Madison |
+---------------------------+------------------+-------------+