Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site amdahl.UUCP Path: utzoo!linus!philabs!cmcl2!seismo!harvard!think!mit-eddie!genrad!decvax!decwrl!sun!amdahl!mat From: mat@amdahl.UUCP (Mike Taylor) Newsgroups: net.arch Subject: Re: Cache revisited Message-ID: <1838@amdahl.UUCP> Date: Thu, 25-Jul-85 13:03:03 EDT Article-I.D.: amdahl.1838 Posted: Thu Jul 25 13:03:03 1985 Date-Received: Sun, 28-Jul-85 04:45:36 EDT References: <5374@fortune.UUCP> <268@gcc-bill.ARPA> Distribution: net Organization: Amdahl Corp, Sunnyvale CA Lines: 50 > Could someone who has a decent understanding of memory management systems > give me a short discourse on the following? The fact that I make a comment does not imply any pretensions of a decent understanding. > > I'd like to compare and contrast the difference in performance between a > simple single level paged memory manager using a ram (a la Sage 68000) and > a system like the IBM DAT box, where the page tables are stored in main memory > and cached in hardware. The point being that switching context is MUCH > faster if you only need to change the pointer to the page tables, rather than > copy 8K of paging information into the page table ram. It is assummed that > the cache used to speed up the main memory page table accesses is sufficiently > large to get a good hit rate (what ever that may be). > In fact, the context switch in S/370 does not require any massive copies. A CPU control register contains the address of the segment tables associated with the current address space. This is called the Segment Table Origin (STO). A cached list contains some (implementation-dependent) set of these values, and maps them to a small number, the STO ID. Translations are cached in a buffer called the Translation Lookaside Buffer (TLB). Each translation in the TLB is associated with a particular STO ID, or else is marked as being common to all address spaces (Common Segment). Therefore, many translations for the same virtual address may reside in the TLB, each associated with a different address space by means of the STO ID. Instructions are provided to selectively or completely invalidate entries in the TLB. The reason for caching the entries relates to the cycle time objectives for the machine. If you use the simple hardware, then main storage access time is factored into the cycle time for address translation. In our implementation of S/370, this would mean substituting (say) 200 ns. main storage for the 7.5 ns. rams used. The difference would add directly to cycle time (simplistically, at least), which would result in making the machine run about 9 times slower, ignoring the effects of TLB misses, which are very closely related to cache misses in our machine. The reason for the relation is that we use a virtually addressed cache and therefore include the TLB information in the cache tag. The effects of TLB misses, however, are generally quite small in high-end systems. This dramatic difference relates directly to the performance difference between the cache RAM and main storage, related to the machine cycle time (23.25 ns. - 43 MHz.). -- Mike Taylor ...!{ihnp4,hplabs,amd,sun}!amdahl!mat [ This may not reflect my opinion, let alone anyone else's. ]