Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!samsung!uunet!ccicpg!leo!jack
From: jack@leo.UUCP (Jack Benkual)
Newsgroups: comp.arch
Subject: Re: Throwaway: Speedup address formation for cache lookup
Summary: Other optimizations for address additions followed by cache accesses
Message-ID: <86893@leo.UUCP>
Date: 13 Sep 90 18:48:27 GMT
References: <AGLEW.90Sep11231456@dwarfs.crhc.uiuc.edu>
Organization: ICL North America, Irvine, CA.
Lines: 33

In article <AGLEW.90Sep11231456@dwarfs.crhc.uiuc.edu>, aglew@crhc.uiuc.edu (Andy Glew) writes:
$ Here's another of my throwaway ideas[*]:
$ 
$ Want to remove logic from the critical path of your processor?  Is
$ address formation/translation/cache lookup on the critical path?
$ ....
$ Problem: the address addition has to be completed before cache lookup.
$    That might be a while for 64 bit addresses.
$..... 
$ (1) Many addresses have limited carry propagation.  .....
$     possibly faster than full carry propagate, again, with fallback in case
$     of error.
$ 
$ (2) Most variables are accessed in only two possible ways: 
$     either by a full pointer, or by a single fixed base and offset. .....
$     in a cache line, then triple the tags (one full address, one base+index)
$     might be used, with triple the match logic.  
$     	This might permit the base+index to be sent directly to the cache, 
$     although the extra circuitry + delays might lose any potential speedup.

The above optimizations as well as the following ones are possible when on
chip caches and MMU's are used:

  (3) The cache index address bits are typically divided in two groups. One
  group is decoded to access the selected row and the other group is used to
  select the column that is desired. The row decoding needs to be done first.
  Interesting simplifications occur when you try to perform an add and
  decode. So one can build a row decoder that gets two operands and an expected
  carry from the lower order bits. It can try to fetch both possible addresses
  regardless the incoming carry and let the multiplexer select between the
  two. In any case relatively small number of bits need to be added to start
  accessing the cache. This can be balanced by moving more bits to the 
  multiplexing phase.