Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!wuarchive!emory!gatech!mcnc!duke!egr.duke.edu!rfk From: rfk@egr.duke.edu (Robert F. Krick) Newsgroups: comp.arch Subject: Re: Smart I-cache? Message-ID: <1157@cameron.egr.duke.edu> Date: 1 Nov 90 18:53:57 GMT References: <2823@crdos1.crd.ge.COM> Organization: Duke University EE Dept.; Durham, NC Lines: 47 From article <2823@crdos1.crd.ge.COM>, by davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr): > > The feature is intelligent I-cache, which only stores instructions > which are the target of jumps. > > [stuff deleted] > > If anyone has any info on recent work (if any) I'd like to hear it. If > there are any good papers I should look up I'd like to see them, too. > Obviously this must either be harder to do than I think, or provide less > benefit, or everyone would be doing it. Although not a "smart I-cache", Dr. Apostolos Dollas (my advisor) and I have been working to *eliminate* the performance degradation associated with I-cache misses and branch penalties. To accomplish this, we have developed the Sustained Performance Architecture (SPA). This architecture is based on the concept that program flow information (extracted prior to execution) can be used to identify those basic blocks which are candidates for execution within the next n cycles (where n is the number of cycles of latency for the memory). The instructions from these basic blocks can be prefetched from multiple banks of instruction memory such that the required instructions are available when they are needed by the processor. (Of course, some instructions will be prefetched unnecessarily, but these are not delivered to the processor.) In essence, SPA leverages information available when the program is compiled in order to improve performance. Since the program flow information is extracted prior to execution, SPA cannot *improve* the performance for branches which have dynamically defined targets (i.e. jmp r2). With the notable exception of subroutine returns which SPA can handle without any loss in performance, the degradation associated with this class of branch instructions is insignificant ( << 1%), because these instructions are sufficiently rare in compiled code from languages such as FORTRAN and C. We have developed a working `proof of concept' prototype. In addition, we have applied for patent protection in both the USA and Japan. If you would like more details about this architecture, please see our article in the December '89 issue of the ACM "Computer Architecture News". Based on demand, I will make additional technical reports available via ftp. ======================================================================== Robert F. Krick; Dept. of Elec. Eng.; Duke University; Durham, NC 27706 Internet: rfk@cameron.egr.duke.edu AT&T: (919)660-5268 "When the branching gets conditional, the prefetching gets tough, and the tough get going!"