Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!tut.cis.ohio-state.edu!ucbvax!IBM.COM!JOSH
From: JOSH@IBM.COM ("Josh Knight")
Newsgroups: comp.arch
Subject: Re: Parallel cache and TLB lookup
Message-ID: <9004110303.AA21059@ucbvax.Berkeley.EDU>
Date: 11 Apr 90 03:00:25 GMT
Sender: daemon@ucbvax.BERKELEY.EDU
Lines: 47

In <1830@gannet.cl.cam.ac.uk> cet1@cl.cam.ac.uk (C.E. Thompson) writes:
 > This brought to mind a question that has been niggling me for some
 > years: how is the trick worked when the software is *not* so
 > constrained? The IBM 308x and 3090 mainframes have (mostly) 64K caches
 > (per processor) which are 4-way set associative; and again only the
 > bottom 12 bits of the address are invariant under the virtual-to-real
 > mapping. However, the software is allowed to (and IBM operating systems
 > in fact do) reference a page of storage at different times by both
 > virtual and real addresses, whose low-order 14 bits will not, usually,
 > be equal.
 >
 > How is it done? I have never found an answer in the review articles in
 > the IBM journals (R&D, Systems). Is it, perhaps, a trade secret? In
 > earlier models, such as the IBM 3033, as the cache increased in size
 > so did the multiplicity of the associative lookup.
 >

The answer for the 3090 is in the article referenced in the appended refer
format citation, in this quote from page 10 of the cited article:

    An interesting complexity in cache design that has been given special
    treatment in the 3090 cache has to do with synonyms.  Virtual storage
    in System/370-XA architecture allows relocation of 4K-byte pages.  This
    means that the low-order 12 address bits that address a byte within a
    page are the same for both a virtual and a real address.  Architecture,
    however, allows different virtual addresses to map to the same real
    address. Thus the cache is managed by real addresses, despite the fact
    that it is accessed by virtual address.  Since it takes 16 bits to
    address a 64K-byte cache and there are only 12 real bits available, we
    lack four bits.  There are thus 16 places in the cache where an operand
    might reside.  Four of these locations are read out of the cache
    simultaneously on the initial cache read operation.  The directory,
    however, is built to read out all 16 entries simultaneously.  Thus, if
    there is a miss on all of the primary four locations but a hit on one of
    the other 12, the cache can be read correctly with a minimum delay.

%T The IBM 3090 System:  An Overview
%A S.G. Tucker
%J IBM Systems Journal
%V 25
%N 1
%P 4-19
%D 1986


Josh Knight
josh@ibm.com