Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!wuarchive!gem.mps.ohio-state.edu!apple!oliveb!mipos3!omepd!mipon2!rmb From: rmb@mipon2.intel.com (Bob Bentley) Newsgroups: comp.arch Subject: Re: Software modularity vs. instruction locality Message-ID: <5164@omepd.UUCP> Date: 9 Nov 89 19:42:28 GMT References: <17707@watdragon.waterloo.edu> <23604@cup.portal.com> <1989Nov2.190900.29144@world.std.com> <1989Nov4.004529.10049@ico.isc.com> <1TMk2X#Qggn6=eric@snark.uu.net> Sender: news@omepd.UUCP Reply-To: rmb@mipon2.UUCP (Bob Bentley) Distribution: usa Organization: Intel Corp., Hillsboro, Oregon Lines: 42 In article <1TMk2X#Qggn6=eric@snark.uu.net> eric@snark.uu.net (Eric S. Raymond) writes: > > ... There's a subtle problem here; good software modularity >practices tend to hurt code locality. If you're calling subroutines a lot >in generated code the PC jumps all over the shop. > >I have no statistics on this, but I can easily imagine something like, say, >the inner loop of a threaded-code interpreter busting hell out of the I cache >because it's doing the equivalent of an indexed call indirect to some far-off >routine every couple of instructions. > >Has anyone done any systematic investigation of this issue? >-- We did a pretty thorough study of cache behavior at BiiN (now, alas, defunct). This was prompted by the large variance which we observed in cache hit ratios for different benchmarks; in particular, OS-intensive benchmarks were *much* worse than CPU-only (Dhrystone, etc.) or Unix utility (grep, diff, etc.) tests. There were a number of contributing causes to the observed cache behavior (use of a sub-sectored caching scheme was a major one, since it led to an effective cache occupancy of < 40%). However, the nature of OS code was certainly a factor. The BiiN OS was written in Ada in a highly modular fashion. The first versions of the OS contained considerable amounts of diagnostic code, and very few routines were inlined. The result was that there was very little either spatial or temporal locality in the OS code. Calls/branches were very frequent, tight loops were very rare. Though not as bad as Eric's hypothetical example, the effect on cache hit ratio and hence on overall system performance was still significant. There are some definite negative performance implications to modular programming techniques which need to be kept in mind. (This is not to say that I am opposed to modular programming techniques, especially for projects as large and complex as the BiiN OS - as a former colleague of mind used to ask, "Do you want it to go fast or do you want it to work"?). Bob Bentley -------------------------------------------------------------------------------- | Intel Corp., M/S JF1-58 UUCP: rmb@omefs3.intel.com | | 2111 N.E. 25th Avenue Phone: (503) 696-4728 | | Hillsboro, Oregon 97124 Fax: (503) 696-4515 | --------------------------------------------------------------------------------