Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!wuarchive!gem.mps.ohio-state.edu!apple!oliveb!mipos3!omepd!mipon2!rmb
From: rmb@mipon2.intel.com (Bob Bentley)
Newsgroups: comp.arch
Subject: Re: Software modularity vs. instruction locality
Message-ID: <5164@omepd.UUCP>
Date: 9 Nov 89 19:42:28 GMT
References: <17707@watdragon.waterloo.edu> <23604@cup.portal.com> <1989Nov2.190900.29144@world.std.com> <1989Nov4.004529.10049@ico.isc.com> <1TMk2X#Qggn6=eric@snark.uu.net>
Sender: news@omepd.UUCP
Reply-To: rmb@mipon2.UUCP (Bob Bentley)
Distribution: usa
Organization: Intel Corp., Hillsboro, Oregon
Lines: 42

In article <1TMk2X#Qggn6=eric@snark.uu.net> eric@snark.uu.net (Eric S. Raymond) writes:
>
>             ... There's a subtle problem here; good software modularity
>practices tend to hurt code locality. If you're calling subroutines a lot
>in generated code the PC jumps all over the shop.
>
>I have no statistics on this, but I can easily imagine something like, say,
>the inner loop of a threaded-code interpreter busting hell out of the I cache
>because it's doing the equivalent of an indexed call indirect to some far-off
>routine every couple of instructions.
>
>Has anyone done any systematic investigation of this issue?
>-- 

We did a pretty thorough study of cache behavior at BiiN (now, alas, defunct).
This was prompted by the large variance which we observed in cache hit ratios
for different benchmarks; in particular, OS-intensive benchmarks were *much*
worse than CPU-only (Dhrystone, etc.) or Unix utility (grep, diff, etc.) tests.

There were a number of contributing causes to the observed cache behavior
(use of a sub-sectored caching scheme was a major one, since it led to an
effective cache occupancy of < 40%).  However, the nature of OS code was
certainly a factor.  The BiiN OS was written in Ada in a highly modular fashion.
The first versions of the OS contained considerable amounts of diagnostic code,
and very few routines were inlined.  The result was that there was very little
either spatial or temporal locality in the OS code.  Calls/branches were very
frequent, tight loops were very rare.   Though not as bad as Eric's hypothetical
example, the effect on cache hit ratio and hence on overall system performance 
was still significant.  There are some definite negative performance
implications to modular programming techniques which need to be kept in mind.
(This is not to say that I am opposed to modular programming techniques,
especially for projects as large and complex as the BiiN OS - as a former
colleague of mind used to ask, "Do you want it to go fast or do you want it to
work"?).

	Bob Bentley

--------------------------------------------------------------------------------
|	Intel Corp., M/S JF1-58			UUCP:   rmb@omefs3.intel.com   |
|	2111 N.E. 25th Avenue			Phone:  (503) 696-4728         |
|	Hillsboro, Oregon 97124			Fax:    (503) 696-4515         |
--------------------------------------------------------------------------------