Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!mcnc!duke!rfk From: rfk@duke.cs.duke.edu (Robert F. Krick) Newsgroups: comp.arch Subject: Re: Smart I-cache? Summary: description of the SPA concept and it benefits Message-ID: <657720712@lear.cs.duke.edu> Date: 4 Nov 90 12:11:54 GMT References: <2823@crdos1.crd.ge.COM> <1157@cameron.egr.duke.edu> Organization: Duke University Computer Science Dept.; Durham, N.C. Lines: 90 I have accumulated the several followups to my last posting. Here are some replies: In article <39324@ucbvax.BERKELEY.EDU>, jbuck@galileo.berkeley.edu (Joe Buck) writes: > Object-oriented languages are becoming more popular; there are many more > indirect branches (in C++, these are virtual function calls) in such > languages. However, I have no statistics on what fraction of instructions > in such programs are indirect branches, and this may vary widely from > program to program. You'd do very poorly on Forth code, but I doubt that > you care all that much. :-) Indeed the Sustained Performance Architecture (SPA) which I described may not be appropriate for all languages because of the requirement that "most" of the program flow must be known prior to execution. As I said, SPA cannot *improve* the performance for portions of the program where the branches have dynamically defined targets. A special purpose i-cache could easily be incorporated in SPA to explicitly handle these situations in the same manner as other architectures. The decision to incorporate such an i-cache would depend on the frequency and penalty associated with these dynamically defined branch targets. >> We have developed a working `proof of concept' prototype. In addition, >> we have applied for patent protection in both the USA and Japan. > Have you gotten it to the point where you can run the SPECmarks on it > (meaning that you've finished Fortran and C compilers that can provide > the scheduling information)? The goal of our `proof of concept' prototype was to demonstrate that critical aspects of the architecture could be implemented in hardware. We have accomplished this goal by building a working system. We do not have any plans to develop the existing prototype further into a system capable of executing the SPECmarks. However, we are actively seeking licensing and/or funding which will enable us to develop the next generation prototype, namely, a full-fledged general purpose computer. My current research is designing and implementing the necessary `scheduling' algorithms for the compiler. In article <13120@encore.Encore.COM>, jcallen@encore.Com (Jerry Callen) writes: > This is, of course, the AMD29K branch target cache, which stores the > target of the branch and the next 3 instructions. As you realize, any technique which is heuristic or based upon locality of reference cannot be 100% effective. As a result, there will *always* be some performance degradation for techniques such as branch prediction, i-caches, branch target caches, etc. In contrast, the SPA concept addresses the *cause* of performance degradation in instruction sequencing by looking ahead in the program flow to prefetch all instructions which may be required. As the disparity between processor speed and main memory speed increases, heuristic and locality-based solutions are becoming increasingly expensive and IMHO will eventually give way to structured solutions which provide higher performance at lower cost. In article <2829@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.com (bill davidsen) writes: >> In article <1157@cameron.egr.duke.edu> rfk@egr.duke.edu (Robert F. Krick) >> writes: >> Since the program flow information is extracted prior to execution, SPA >> cannot *improve* the performance for branches which have dynamically >> defined targets (i.e. jmp r2). With the notable exception of subroutine >> returns which SPA can handle without any loss in performance, the degradation >> associated with this class of branch instructions is insignificant ( << 1%), >> because these instructions are sufficiently rare in compiled code from >> languages such as FORTRAN and C. > > register int (*state)(); > > /* code */ > if (m < n) (*state)(foo, mumble, barf); > > I agree that this is an infrequent case. Thanks for your support! There is another (possibly more frequent) situation in which branches with dynamically defined targets may be used: a branch table for the switch/case construct in C. For small branch tables, it is more appropriate in SPA to generate code for a binary decision tree (despite the slight increase in code size). The code for these binary decision trees can then be executed without any loss in performance. ======================================================================== Robert F. Krick; Dept. of Elec. Eng.; Duke University; Durham, NC 27706 Internet: rfk@cameron.egr.duke.edu AT&T: (919)660-5268 "When the branching gets conditional, the prefetching gets tough, and the tough get going!"