Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!mcnc!duke!rfk
From: rfk@duke.cs.duke.edu (Robert F. Krick)
Newsgroups: comp.arch
Subject: Re: Smart I-cache?
Summary: description of the SPA concept and it benefits
Message-ID: <657720712@lear.cs.duke.edu>
Date: 4 Nov 90 12:11:54 GMT
References: <2823@crdos1.crd.ge.COM> <1157@cameron.egr.duke.edu>
Organization: Duke University Computer Science Dept.; Durham, N.C.
Lines: 90


I have accumulated the several followups to my last posting.  Here are
some replies:

In article <39324@ucbvax.BERKELEY.EDU>, jbuck@galileo.berkeley.edu 
(Joe Buck) writes:
> Object-oriented languages are becoming more popular; there are many more
> indirect branches (in C++, these are virtual function calls) in such
> languages.  However, I have no statistics on what fraction of instructions
> in such programs are indirect branches, and this may vary widely from
> program to program.  You'd do very poorly on Forth code, but I doubt that
> you care all that much. :-)

Indeed the Sustained Performance Architecture (SPA) which I described may 
not be appropriate for all languages because of the requirement that "most" 
of the program flow must be known prior to execution.  As I said, SPA cannot
*improve* the performance for portions of the program where the branches
have dynamically defined targets.  A special purpose i-cache could easily 
be incorporated in SPA to explicitly handle these situations in the same 
manner as other architectures.  The decision to incorporate such an i-cache
would depend on the frequency and penalty associated with these dynamically 
defined branch targets.


>> We have developed a working `proof of concept' prototype.  In addition, 
>> we have applied for patent protection in both the USA and Japan.

> Have you gotten it to the point where you can run the SPECmarks on it
> (meaning that you've finished Fortran and C compilers that can provide
> the scheduling information)?

The goal of our `proof of concept' prototype was to demonstrate that critical 
aspects of the architecture could be implemented in hardware.  We have 
accomplished this goal by building a working system.  We do not have any
plans to develop the existing prototype further into a system capable of 
executing the SPECmarks.  However, we are actively seeking licensing and/or 
funding which will enable us to develop the next generation prototype, namely,
a full-fledged general purpose computer.  My current research is designing
and implementing the necessary `scheduling' algorithms for the compiler.


In article <13120@encore.Encore.COM>, jcallen@encore.Com (Jerry Callen) writes:
> This is, of course, the AMD29K branch target cache, which stores the
> target of the branch and the next 3 instructions. 

As you realize, any technique which is heuristic or based upon locality of 
reference cannot be 100% effective.  As a result, there will *always* be 
some performance degradation for techniques such as branch prediction, 
i-caches, branch target caches, etc.  In contrast, the SPA concept addresses 
the *cause* of performance degradation in instruction sequencing by looking 
ahead in the program flow to prefetch all instructions which may be required.  
As the disparity between processor speed and main memory speed increases, 
heuristic and locality-based solutions are becoming increasingly expensive 
and IMHO will eventually give way to structured solutions which provide 
higher performance at lower cost.


In article <2829@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.com (bill davidsen)
writes:
>> In article <1157@cameron.egr.duke.edu> rfk@egr.duke.edu (Robert F. Krick) 
>> writes:
>> Since the program flow information is extracted prior to execution, SPA 
>> cannot *improve* the performance for branches which have dynamically 
>> defined targets (i.e. jmp r2).  With the notable exception of subroutine
>> returns which SPA can handle without any loss in performance, the degradation
>> associated with this class of branch instructions is insignificant ( << 1%),
>> because these instructions are sufficiently rare in compiled code from 
>> languages such as FORTRAN and C.
>
>  register int (*state)();
>
> /* code */
> if (m < n) (*state)(foo, mumble, barf);
>
> I agree that this is an infrequent case.

Thanks for your support!  There is another (possibly more frequent) situation 
in which branches with dynamically defined targets may be used:  a branch 
table for the switch/case construct in C.  For small branch tables, it is 
more appropriate in SPA to generate code for a binary decision tree (despite 
the slight increase in code size).  The code for these binary decision 
trees can then be executed without any loss in performance.

========================================================================
Robert F. Krick; Dept. of Elec. Eng.; Duke University; Durham, NC  27706
    Internet:  rfk@cameron.egr.duke.edu 	AT&T: (919)660-5268

	  "When the branching gets conditional, the prefetching 
	   gets tough, and the tough get going!"