Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!purdue!mentor.cc.purdue.edu!pur-ee!hankd
From: hankd@pur-ee.UUCP (Hank Dietz)
Newsgroups: comp.arch
Subject: Re: Dynamic frequency analysis (was Register usage)
Summary: what static analysis looks like
Keywords: compiler analysis, expected execution counts
Message-ID: <11743@pur-ee.UUCP>
Date: 24 May 89 15:23:49 GMT
References: <966@aber-cs.UUCP>
Reply-To: hankd@pur-ee.UUCP (Hank Dietz)
Distribution: eunet,world
Organization: Purdue University Engineering Computer Network
Lines: 95


In article <966@aber-cs.UUCP> you write:
>In other words, "used in a loop" is a good estimator of dynamic frequency
>of use, and spares the optimizer from having to cache everything in sight
>to get a good chance of caching all the variables that matter.

"Used in a loop" is not the flip side to dynamic collection of stats; static
analysis of expected execution count is.  Where possible, the compiler
should compute the exact number of iterations made by loops, etc.  My PhD
thesis, and other papers by other folk, show how to do this.

It is only when the exact analysis is THEORETICALLY IMPOSSIBLE that an
approximate scheme is used.  For example:

f()
{
	{A}
	if (B) {
		{C}
		while (D) {
			{E}
		}
		if (F) {
			{G}
		}
	} else {
		{H}
	}
	{I}
	while (J) {
		{K}
	}
	{L}
}

In the above code, let's assume that no range information, etc., is available
about the expressions B, D, F, and J.  Then one might say that the expected
execution counts are:

code	E(count)	comments
----	--------	--------
{A}	1		actually 1 times E(function f)
B	1
{C}	0.51		as per Multiflow, 0.51/0.49 then/else
D	0.51*101	as per Multiflow, 100 iterations per while
{E}	0.51*100
F	0.51
{G}	0.51*0.51	simple composition
{H}	0.49
{I}	1
J	1*101
{K}	1*100
{L}	1

This generates the following partial ordering of regions based on expected
execution count:

J		101
{K}		100
D		 51.51
{E}		 51
{A},B,{I},{L}	  1
{C},F		  0.51
{H}		  0.49
{G}		  0.2601

Hence, despite having NO range analysis, it will be recognized that the while
loop J,{K} is more important than the while loop D,{E}.  Now, you might argue
that you can easily create a program where this would be incorrect...  but
the point is that it is very rare that this analysis would mislead you AND
getting a few dynamic statistics would correct the error.  In other words,
dynamic statistics are just about as unreliable as static guesses...  it
might be depressing, but that's how it is most of the time.

Of course, the above expected counts can be further weighted by the expected
spill costs, etc.

In fact, a compiler doesn't allocate VARIABLES to registers -- it allocates
VALUES to registers -- hence all these issues really are rather artificial.
Interesting register allocation problems only arrise when the number of
simultaneously live values which are good candidates for registers exceeds
the number of registers available.  This does happen, but not as often as
you might think....

>Well, I may be sentimental, but I still prefer "register" and my own
>judgment. On the other hand, of course, "performance-by-design" is a win
>only if it is done. If not, an optimizer is just a fixup, but may be useful
>("read carefully the instructions...").

Use register wherever possible -- it means "no address, no alias, and
non-volatile."  That can help the compiler understand your program better.
As for you coding better than an optimizing compiler, well, congratulations,
John Henry. ;-)

						-hankd@ee.ecn.purdue.edu