Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site amdahl.UUCP
Path: utzoo!linus!philabs!cmcl2!seismo!harvard!think!mit-eddie!genrad!decvax!decwrl!sun!amdahl!mat
From: mat@amdahl.UUCP (Mike Taylor)
Newsgroups: net.arch
Subject: Re: Cache revisited
Message-ID: <1838@amdahl.UUCP>
Date: Thu, 25-Jul-85 13:03:03 EDT
Article-I.D.: amdahl.1838
Posted: Thu Jul 25 13:03:03 1985
Date-Received: Sun, 28-Jul-85 04:45:36 EDT
References: <5374@fortune.UUCP> <268@gcc-bill.ARPA>
Distribution: net
Organization: Amdahl Corp, Sunnyvale CA
Lines: 50

> Could someone who has a decent understanding of memory management systems
> give me a short discourse on the following?

The fact that I make a comment does not imply any pretensions of
a decent understanding.

> 
> I'd like to compare and contrast the difference in performance between a
> simple single level paged memory manager using a ram (a la Sage 68000) and
> a system like the IBM DAT box, where the page tables are stored in main memory
> and cached in hardware. The point being that switching context is MUCH
> faster if you only need to change the pointer to the page tables, rather than
> copy 8K of paging information into the page table ram. It is assummed that
> the cache used to speed up the main memory page table accesses is sufficiently
> large to get a good hit rate (what ever that may be).
> 
In fact, the context switch in S/370 does not require any massive copies.
A CPU control register contains the address of the segment tables
associated with the current address space. This is called the
Segment Table Origin (STO). A cached list contains some
(implementation-dependent) set of these values, and maps them to a
small number, the STO ID. Translations are cached in a buffer
called the Translation Lookaside Buffer (TLB). Each translation
in the TLB is associated with a particular STO ID, or else is marked as
being common to all address spaces (Common Segment). Therefore,
many translations for the same virtual address may reside in the TLB,
each associated with a different address space by means of the STO ID.
Instructions are provided to selectively or completely invalidate
entries in the TLB.

The reason for caching the entries relates to the cycle time objectives
for the machine.  If you use the simple hardware, then main storage access
time is factored into the cycle time for address translation.  In our
implementation of S/370, this would mean substituting (say) 200 ns.
main storage for the 7.5 ns. rams used. The difference would add
directly to cycle time (simplistically, at least), which would result
in making the machine run about 9 times slower, ignoring the effects of
TLB misses, which are very closely related to cache misses in our
machine.   The reason for the relation is that we use a virtually
addressed cache and therefore include the TLB information in the cache
tag.  The effects of TLB misses, however, are generally quite small in
high-end systems.

This dramatic difference relates directly to the performance difference
between the cache RAM and main storage, related to the machine
cycle time (23.25 ns. - 43 MHz.).
-- 
Mike Taylor                        ...!{ihnp4,hplabs,amd,sun}!amdahl!mat

[ This may not reflect my opinion, let alone anyone else's.  ]