Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uwm.edu!ux1.cso.uiuc.edu!ux1.cso.uiuc.edu!aglew From: aglew@crhc.uiuc.edu (Andy Glew) Newsgroups: comp.arch Subject: Throwaway: Speedup address formation for cache lookup Message-ID: Date: 12 Sep 90 04:14:56 GMT Sender: news@ux1.cso.uiuc.edu (News) Organization: Center for Reliable and High-Performance Computing University of Illinois at Urbana Champaign Lines: 39 Here's another of my throwaway ideas[*]: Want to remove logic from the critical path of your processor? Is address formation/translation/cache lookup on the critical path? Virtual caches remove address translation, but you still have address formation - since typical addresses are formed by an addition (AMD29000 notwithstanding) and cache lookup. Problem: the address addition has to be completed before cache lookup. That might be a while for 64 bit addresses. (Well, you can play around with self timing, or piping out the lsbs before the msbs have formed). (Or you can use the IBM trick of assuming that the next address will be in the same set as previously accessed, to reduce cache setup) Possible hacks to reduce the need to wait for the address addition: (1) Many addresses have limited carry propagation. In the extreme case, you could just OR the base+index together and send that to the cache quickly, falling back to using the full addition in case of an error. (I've already published results on the degree of success of ORing). Or, you can use limited cary length propagate circuits, which are possibly faster than full carry propagate, again, with fallback in case of error. (2) Most variables are accessed in only two possible ways: either by a full pointer, or by a single fixed base and offset. Usually only one of the many possible base+offset pairs is used. If this statement is true for all variables in a cache line, then triple the tags (one full address, one base+index) might be used, with triple the match logic. This might permit the base+index to be sent directly to the cache, although the extra circuitry + delays might lose any potential speedup. Like I said, they're throwaway ideas. [*] I'm bored, waiting for a trace collection to finish. -- Andy Glew, a-glew@uiuc.edu [get ph nameserver from uxc.cso.uiuc.edu:net/qi]