Path: utzoo!attcan!uunet!lll-winken!ames!vsi1!wyse!mips!mash
From: mash@mips.COM (John Mashey)
Newsgroups: comp.arch
Subject: Re: Virtual caches & PIDs [was Re: i860 CPU information]
Message-ID: <15531@winchester.mips.COM>
Date: 19 Mar 89 01:17:46 GMT
References: <24869@amdcad.AMD.COM> <2280004@hpsal2.HP.COM>
Reply-To: mash@mips.COM (John Mashey)
Organization: MIPS Computer Systems, Sunnyvale, CA
Lines: 83

In article <2280004@hpsal2.HP.COM> viggy@hpsal2.HP.COM (Viggy Mokkarala) writes:

>John Mashey writes:

>>This is NOT a populatr misconception.  If you have a "simple"
>>virtual-addressed/virtual-tagged cache, i.e., as in the i860, with
>>neither pids/asids, nor the (more complex) segment-style scheme of HP PA,
>>then you will flush the caches on context switches, and you might do it
>>more often, depending on the TLB/cache interactions, and how tricky the
>>OS wants to get in deferring flushes.

>Just to set the record straight about HP PA, I'd like to make the following
>observations:
	(description of HP PA scheme - thanx)
Sorry if my complex sentence caused confusion.  It was trying to say:

IF "simple" scheme, THEN flush caches on context switch
ELSE /* PIDS or segment/space scheme of HP PA */ don't flush caches.

HP PA is clearly one of the RISCs designed with attention to
multi-tasking, since it, for example, manages to use virtual-addressed
caches while still sharing sharable code & data in the cache efficently,
unlike simple PID schemes.  For example, suppose, in a multi-user
system, using a simple cache-PID scheme, you have several people using
the same program (such as an editor, compiler, DBMS client, etc,
or one other program described later):
	Suppose you have executed process A, using that program,
	and you context-switch to program B, using that same program.
	You switch to the new program's PID, and even though it may
	be executing the exact same code, it I-cache-misses on all of it,
	because it has the wrong PID. Same thing happens for shared libraries.
	Same thing happens if it uses shared data, which things like
	DBMS do, or perhaps, on some systems X-clients & X-server;
	only this time it D-cache misses.  This is especially exciting
	for dirty data: A writes some data.  B attempts to read it.
	Since the PID's mismatch, you flush it to memory, then you
	re-read it to get one there with the right copy.
Does this matter?
	If the caches are really small, then not much, as far as I can tell,
	although there isn't a lot of data pbulished on this kind of
	think, for good reason. {If you know a lot about this, you're
	probably a vendor who treats this as serious competitive info.}
	However, high-performance systems don't have small caches,
	and the caches MUST continue to grow, since DRAM refuses to offer
	seriously-better access times.

Note that HP PA avoids most of this problem via the space indentifiers,
which are, in some sense a collection of PIDs or ASIDs.  Thus, if you
switch to a different process that's using the same program, the second
process's space-maps (or whatever they call them) can at least point
at the same code as did the first process, and there can be shared data
regions, etc, without requiring cache-misses to refill the cache.
The obvious issue that's left is doing fork-with-copy-on-write,
as that looks a bit difficult to do in the classical way.  Still, the
scheme OBVIOUSLY was thinking of multi-tasking environments with
efficiently-shared code and data, in the presence of large caches.
(Ross Bott of Pyramid gave a good talk at Uniforum that included some
discussions of cache issues in OLTP environments.)

Oh, the one other program that's real important is the UNIX kernel
itself, which, in some sense, is often treated as a giant shared
library.  It has terrible locality, and so you have to kick and
scratch to get every % of hit rate you can.  Note that the
most straightforward of PID schemes causes considerable extra
thrashing around in the presence of serious context-switch rates.
The typical choices come down to letting the kernel use the same PID as
the current user program, which means you communicate well with it,
or using a specific pid for the kernel itself, which means better
hit rates for the kernel code itself, but more overhead in dealing
with user processes, or combinations that switch back and forth.
(maybe somebody that does this kind of thing can post some useful info)

Of course, numerous variations are possible, but the basic idea
is: if you're using simple virtual caches, with no PIDs, or
even with PIDs, you're probably thinking more about single-user
performance than you are about multi-tasking performance.
(There's nothing wrong with that tradeoff, of course, if that's what
you're trying to do.)
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086