Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!amdcad!weitek!sci!kenm From: kenm@sci.UUCP (Ken McElvain) Newsgroups: comp.arch Subject: Re: Load/Branch ratio [was Re: 486 and 68040] Summary: invariants across architectures Message-ID: <44621@sci.UUCP> Date: 30 Apr 89 07:01:29 GMT References: <17131@cup.portal.com> <12435@reed.UUCP> <3913@mipos3.intel.com> <18253@winchester.mips.COM> Organization: Silicon Compilers Systems Corp. San Jose, Ca Lines: 45 In article <18253@winchester.mips.COM>, mash@mips.COM (John Mashey) writes: > In article <25428@amdcad.AMD.COM> tim@amd.com (Tim Olson) writes: > >In article <18201@winchester.mips.COM> mash@mips.COM (John Mashey) writes: > >| Yes, certainly a good tradeoff; loads are more frequent than branches. > > >Interesting -- what kind of numbers do you see? On the Am29000, we tend > >to see just the opposite, although they are somewhat close: > > On R3000s, we see grossly similar effects, but the comment was directed to > the 386/486 chips. I.e.: > a) Across typical micro architectures, the NUMBER of branches > would be grossly equal, even with different compiler technology > ,o thje the major exception of loop-unrolling effects in real loopy code. > b) The NUMBER of loads/stores, however, can vary quite a bit, > affected by: > 1) The number of registers available at once > 2) Register windows/stack caches/etc for subroutine calls > 3) Global optimization technology > 4) The nature of the program, i.e., some loads and stores > can be eliminated by optimizers or windows, some won't go > away no matter what you do. One measurement that I have always disliked because of its incompleteness is the cache hit percentage. A better measure is the number of cache misses during a run of a given program. With a given cache size and organization this should be relatively constant across different CPU architectures (Within limits, changing sizes of data types would affect things). Architectural changes that affect the number of load/stores would pretty directly affect the cache hit percentage. A small number of registers would lead to an inflated percentage of hits. Better use of registers should lead to lower hit rates. One should be able to get an idea of how well registers are doing their job for a particular architecture/compiler from the cache size and organization, the number of cache misses, and the percentage of hits. A comparison of register windows and simple register files comes to mind. How about some data? Ken McElvain {decwrl,weitek}!sci!kenm