Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!cs.utexas.edu!samsung!aplcen!mef From: mef@aplcen.apl.jhu.edu (Marty Fraeman) Newsgroups: comp.arch Subject: Re: Software modularity vs. instruction locality Message-ID: <3887@aplcen.apl.jhu.edu> Date: 15 Nov 89 16:10:18 GMT References: <17707@watdragon.waterloo.edu> <23604@cup.portal.com> <6374@dime.cs.umass.edu> Reply-To: mef@aplcen (Marty Fraeman) Distribution: na Organization: Johns Hopkins University Lines: 49 In article <6374@dime.cs.umass.edu> shri@ccs1.cs.umass.edu (H.Shrikumar{shri@ncst.in}) writes: >In article <1TMk2X#Qggn6=eric@snark.uu.net> eric@snark.uu.net (Eric S. Raymond) >writes: >>In <1989Nov4.004529.10049@ico.isc.com> Dick Dunn wrote: >>> Second, I would expect better locality >>> for code reference than for data reference, hence the I cache ought to do >>> more good than the D cache. Aren't the pathological cache-busting programs >>> generally ones which spray data accesses all over the place? >> >>Not necessarily. There's a subtle problem here; good software modularity >>practices tend to hurt code locality. If you're calling subroutines a lot >>in generated code the PC jumps all over the shop. > >This happens for example in a FORTH machine, FORTH typically is >subroutine threaded, so there is a flurry of subroutine calls >happening at about 4 million a second. (in a 8-10 Mhz (?) Novix 2016 >Forth CPU). > >In forth there is a subroutine call every five or so instructions >I would guess. We have looked at a similar issue in Forth. Over 90% of sequential code accesses are less than 6.25 instructions long on the SC32 Forth engine. This machine can execute most Forth primitives with a single one cycle instruction except for load and store which take two cycles. Subroutine calls are one cycle and most returns take zero cycles since they can generally be combined with another instruction. We also looked at the effectiveness of instruction caches on this machine and found that fairly small caches (<16KB) could still achieve >95% hit rates. However, since the size of the programs we studied was fairly modest our I-cache size result should be taken with a grain of salt. On the other hand the size of programs we studied was comparable to the size of single threads on typical real-time applications we've developed in the past so I believe there is some significance to our data. As a final comment on I-cache effectiveness in Forth, keep in mind that while Forth instruction traces hop all over the place the hierachical nature of most Forth implementations keeps code size much smaller than usual. Marty Fraeman mef@aplcen.apl.jhu.edu 301-953-5000, x8360 JHU/Applied Physics Laboratory Johns Hopkins Road Laurel, Md. 20707