Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!wuarchive!emory!gatech!mcnc!duke!egr.duke.edu!rfk
From: rfk@egr.duke.edu (Robert F. Krick)
Newsgroups: comp.arch
Subject: Re: Smart I-cache?
Message-ID: <1157@cameron.egr.duke.edu>
Date: 1 Nov 90 18:53:57 GMT
References: <2823@crdos1.crd.ge.COM>
Organization: Duke University EE Dept.; Durham, NC
Lines: 47

From article <2823@crdos1.crd.ge.COM>, by davidsen@crdos1.crd.ge.COM 
(Wm E Davidsen Jr):
> 
>   The feature is intelligent I-cache, which only stores instructions
> which are the target of jumps. 
>
> [stuff deleted]
> 
>   If anyone has any info on recent work (if any) I'd like to hear it. If
> there are any good papers I should look up I'd like to see them, too.
> Obviously this must either be harder to do than I think, or provide less
> benefit, or everyone would be doing it.

Although not a "smart I-cache", Dr. Apostolos Dollas (my advisor) and I 
have been working to *eliminate* the performance degradation associated 
with I-cache misses and branch penalties.  To accomplish this, we have 
developed the Sustained Performance Architecture (SPA).  This architecture 
is based on the concept that program flow information (extracted prior to
execution) can be used to identify those basic blocks which are candidates 
for execution within the next n cycles (where n is the number of cycles
of latency for the memory).  The instructions from these basic blocks can 
be prefetched from multiple banks of instruction memory such that the 
required instructions are available when they are needed by the processor.  
(Of course, some instructions will be prefetched unnecessarily, but these 
are not delivered to the processor.)  In essence, SPA leverages information 
available when the program is compiled in order to improve performance.

Since the program flow information is extracted prior to execution, SPA 
cannot *improve* the performance for branches which have dynamically 
defined targets (i.e. jmp r2).  With the notable exception of subroutine
returns which SPA can handle without any loss in performance, the degradation
associated with this class of branch instructions is insignificant ( << 1%),
because these instructions are sufficiently rare in compiled code from 
languages such as FORTRAN and C.

We have developed a working `proof of concept' prototype.  In addition, 
we have applied for patent protection in both the USA and Japan.  If you 
would like more details about this architecture, please see our article 
in the December '89 issue of the ACM "Computer Architecture News".  Based
on demand, I will make additional technical reports available via ftp.

========================================================================
Robert F. Krick; Dept. of Elec. Eng.; Duke University; Durham, NC  27706
    Internet:  rfk@cameron.egr.duke.edu 	AT&T: (919)660-5268

	  "When the branching gets conditional, the prefetching 
	   gets tough, and the tough get going!"