Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!tut.cis.ohio-state.edu!ucbvax!IBM.COM!JOSH From: JOSH@IBM.COM ("Josh Knight") Newsgroups: comp.arch Subject: Re: Parallel cache and TLB lookup Message-ID: <9004110303.AA21059@ucbvax.Berkeley.EDU> Date: 11 Apr 90 03:00:25 GMT Sender: daemon@ucbvax.BERKELEY.EDU Lines: 47 In <1830@gannet.cl.cam.ac.uk> cet1@cl.cam.ac.uk (C.E. Thompson) writes: > This brought to mind a question that has been niggling me for some > years: how is the trick worked when the software is *not* so > constrained? The IBM 308x and 3090 mainframes have (mostly) 64K caches > (per processor) which are 4-way set associative; and again only the > bottom 12 bits of the address are invariant under the virtual-to-real > mapping. However, the software is allowed to (and IBM operating systems > in fact do) reference a page of storage at different times by both > virtual and real addresses, whose low-order 14 bits will not, usually, > be equal. > > How is it done? I have never found an answer in the review articles in > the IBM journals (R&D, Systems). Is it, perhaps, a trade secret? In > earlier models, such as the IBM 3033, as the cache increased in size > so did the multiplicity of the associative lookup. > The answer for the 3090 is in the article referenced in the appended refer format citation, in this quote from page 10 of the cited article: An interesting complexity in cache design that has been given special treatment in the 3090 cache has to do with synonyms. Virtual storage in System/370-XA architecture allows relocation of 4K-byte pages. This means that the low-order 12 address bits that address a byte within a page are the same for both a virtual and a real address. Architecture, however, allows different virtual addresses to map to the same real address. Thus the cache is managed by real addresses, despite the fact that it is accessed by virtual address. Since it takes 16 bits to address a 64K-byte cache and there are only 12 real bits available, we lack four bits. There are thus 16 places in the cache where an operand might reside. Four of these locations are read out of the cache simultaneously on the initial cache read operation. The directory, however, is built to read out all 16 entries simultaneously. Thus, if there is a miss on all of the primary four locations but a hit on one of the other 12, the cache can be read correctly with a minimum delay. %T The IBM 3090 System: An Overview %A S.G. Tucker %J IBM Systems Journal %V 25 %N 1 %P 4-19 %D 1986 Josh Knight josh@ibm.com