Path: utzoo!attcan!uunet!lll-winken!ames!vsi1!wyse!mips!mash From: mash@mips.COM (John Mashey) Newsgroups: comp.arch Subject: Re: Virtual caches & PIDs [was Re: i860 CPU information] Message-ID: <15531@winchester.mips.COM> Date: 19 Mar 89 01:17:46 GMT References: <24869@amdcad.AMD.COM> <2280004@hpsal2.HP.COM> Reply-To: mash@mips.COM (John Mashey) Organization: MIPS Computer Systems, Sunnyvale, CA Lines: 83 In article <2280004@hpsal2.HP.COM> viggy@hpsal2.HP.COM (Viggy Mokkarala) writes: >John Mashey writes: >>This is NOT a populatr misconception. If you have a "simple" >>virtual-addressed/virtual-tagged cache, i.e., as in the i860, with >>neither pids/asids, nor the (more complex) segment-style scheme of HP PA, >>then you will flush the caches on context switches, and you might do it >>more often, depending on the TLB/cache interactions, and how tricky the >>OS wants to get in deferring flushes. >Just to set the record straight about HP PA, I'd like to make the following >observations: (description of HP PA scheme - thanx) Sorry if my complex sentence caused confusion. It was trying to say: IF "simple" scheme, THEN flush caches on context switch ELSE /* PIDS or segment/space scheme of HP PA */ don't flush caches. HP PA is clearly one of the RISCs designed with attention to multi-tasking, since it, for example, manages to use virtual-addressed caches while still sharing sharable code & data in the cache efficently, unlike simple PID schemes. For example, suppose, in a multi-user system, using a simple cache-PID scheme, you have several people using the same program (such as an editor, compiler, DBMS client, etc, or one other program described later): Suppose you have executed process A, using that program, and you context-switch to program B, using that same program. You switch to the new program's PID, and even though it may be executing the exact same code, it I-cache-misses on all of it, because it has the wrong PID. Same thing happens for shared libraries. Same thing happens if it uses shared data, which things like DBMS do, or perhaps, on some systems X-clients & X-server; only this time it D-cache misses. This is especially exciting for dirty data: A writes some data. B attempts to read it. Since the PID's mismatch, you flush it to memory, then you re-read it to get one there with the right copy. Does this matter? If the caches are really small, then not much, as far as I can tell, although there isn't a lot of data pbulished on this kind of think, for good reason. {If you know a lot about this, you're probably a vendor who treats this as serious competitive info.} However, high-performance systems don't have small caches, and the caches MUST continue to grow, since DRAM refuses to offer seriously-better access times. Note that HP PA avoids most of this problem via the space indentifiers, which are, in some sense a collection of PIDs or ASIDs. Thus, if you switch to a different process that's using the same program, the second process's space-maps (or whatever they call them) can at least point at the same code as did the first process, and there can be shared data regions, etc, without requiring cache-misses to refill the cache. The obvious issue that's left is doing fork-with-copy-on-write, as that looks a bit difficult to do in the classical way. Still, the scheme OBVIOUSLY was thinking of multi-tasking environments with efficiently-shared code and data, in the presence of large caches. (Ross Bott of Pyramid gave a good talk at Uniforum that included some discussions of cache issues in OLTP environments.) Oh, the one other program that's real important is the UNIX kernel itself, which, in some sense, is often treated as a giant shared library. It has terrible locality, and so you have to kick and scratch to get every % of hit rate you can. Note that the most straightforward of PID schemes causes considerable extra thrashing around in the presence of serious context-switch rates. The typical choices come down to letting the kernel use the same PID as the current user program, which means you communicate well with it, or using a specific pid for the kernel itself, which means better hit rates for the kernel code itself, but more overhead in dealing with user processes, or combinations that switch back and forth. (maybe somebody that does this kind of thing can post some useful info) Of course, numerous variations are possible, but the basic idea is: if you're using simple virtual caches, with no PIDs, or even with PIDs, you're probably thinking more about single-user performance than you are about multi-tasking performance. (There's nothing wrong with that tradeoff, of course, if that's what you're trying to do.) -- -john mashey DISCLAIMER: UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086