Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!amdcad!weitek!sci!kenm
From: kenm@sci.UUCP (Ken McElvain)
Newsgroups: comp.arch
Subject: Re: Load/Branch ratio [was Re: 486 and 68040]
Summary: invariants across architectures
Message-ID: <44621@sci.UUCP>
Date: 30 Apr 89 07:01:29 GMT
References: <17131@cup.portal.com> <12435@reed.UUCP> <3913@mipos3.intel.com> <18253@winchester.mips.COM>
Organization: Silicon Compilers Systems Corp. San Jose, Ca
Lines: 45

In article <18253@winchester.mips.COM>, mash@mips.COM (John Mashey) writes:
> In article <25428@amdcad.AMD.COM> tim@amd.com (Tim Olson) writes:
> >In article <18201@winchester.mips.COM> mash@mips.COM (John Mashey) writes:
> >| Yes, certainly a good tradeoff; loads are more frequent than branches.
> 
> >Interesting -- what kind of numbers do you see?  On the Am29000, we tend
> >to see just the opposite, although they are somewhat close:
> 
> On R3000s, we see grossly similar effects, but the comment was directed to
> the 386/486 chips.  I.e.:
> 	a) Across typical micro architectures, the NUMBER of branches
> 	would be grossly equal, even with different compiler technology
> ,o	thje the major exception of loop-unrolling effects in real loopy code.
> 	b) The NUMBER of loads/stores, however, can vary quite a bit,
> 		affected by:
> 		1) The number of registers available at once
> 		2) Register windows/stack caches/etc for subroutine calls
> 		3) Global optimization technology
> 		4) The nature of the program, i.e., some loads and stores
> 		can be eliminated by optimizers or windows, some won't go
> 		away no matter what you do.

One measurement that I have always disliked because of its incompleteness
is the cache hit percentage.  A better measure is the number of cache
misses during a run of a given program.  With a given cache size and
organization this should be relatively constant across different CPU
architectures (Within limits, changing sizes of data types would affect things).

Architectural changes that affect the number of load/stores would pretty
directly affect the cache hit percentage.  A small number of registers would
lead to an inflated percentage of hits.  Better use of registers should
lead to lower hit rates.

One should be able to get an idea of how well registers are doing
their job for a particular architecture/compiler from the cache size and
organization, the number of cache misses, and the percentage of hits.
A comparison of register windows and simple register files comes to mind.

How about some data?


Ken McElvain
{decwrl,weitek}!sci!kenm