Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!samsung!uunet!ccicpg!leo!jack From: jack@leo.UUCP (Jack Benkual) Newsgroups: comp.arch Subject: Re: Throwaway: Speedup address formation for cache lookup Summary: Other optimizations for address additions followed by cache accesses Message-ID: <86893@leo.UUCP> Date: 13 Sep 90 18:48:27 GMT References: Organization: ICL North America, Irvine, CA. Lines: 33 In article , aglew@crhc.uiuc.edu (Andy Glew) writes: $ Here's another of my throwaway ideas[*]: $ $ Want to remove logic from the critical path of your processor? Is $ address formation/translation/cache lookup on the critical path? $ .... $ Problem: the address addition has to be completed before cache lookup. $ That might be a while for 64 bit addresses. $..... $ (1) Many addresses have limited carry propagation. ..... $ possibly faster than full carry propagate, again, with fallback in case $ of error. $ $ (2) Most variables are accessed in only two possible ways: $ either by a full pointer, or by a single fixed base and offset. ..... $ in a cache line, then triple the tags (one full address, one base+index) $ might be used, with triple the match logic. $ This might permit the base+index to be sent directly to the cache, $ although the extra circuitry + delays might lose any potential speedup. The above optimizations as well as the following ones are possible when on chip caches and MMU's are used: (3) The cache index address bits are typically divided in two groups. One group is decoded to access the selected row and the other group is used to select the column that is desired. The row decoding needs to be done first. Interesting simplifications occur when you try to perform an add and decode. So one can build a row decoder that gets two operands and an expected carry from the lower order bits. It can try to fetch both possible addresses regardless the incoming carry and let the multiplexer select between the two. In any case relatively small number of bits need to be added to start accessing the cache. This can be balanced by moving more bits to the multiplexing phase.