Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: Notesfiles $Revision: 1.7.0.10 $; site ccvaxa
Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!mhuxn!ihnp4!inuxc!pur-ee!uiucdcs!ccvaxa!aglew
From: aglew@ccvaxa.UUCP
Newsgroups: net.arch
Subject: Re: How Many Virtual Spaces (minor
Message-ID: <5100065@ccvaxa>
Date: Sun, 20-Apr-86 23:23:00 EST
Article-I.D.: ccvaxa.5100065
Posted: Sun Apr 20 23:23:00 1986
Date-Received: Wed, 23-Apr-86 14:00:31 EST
References: <2089@peora.UUCP>
Lines: 125
Nf-ID: #R:peora.UUCP:2089:ccvaxa:5100065:000:5983
Nf-From: ccvaxa.UUCP!aglew    Apr 20 22:23:00 1986


I'm afraid this is another long post. I go into painful detail about the
advantages of one virtual address space for multiple cache systems;
essentially, the advantages are not qualitative, only quantitative, in that
they permit you to run a multiple cache system a bit faster with less
hardware.

>/* Written  2:52 pm  Apr 18, 1986 by petolino@chronon.UUCP */
>Sharing writable memory among address spaces in a cached system is indeed
>tricky.  As the above postings suggest, problems of consistency arise
>not only between copies of the same physical memory location residing in
>different caches in the system, but also among copies of a physical memory
>location residing in different places in a single cache (by 'consistency'
>we mean making sure that all cached copies of a physical memory location
>accurately reflect any writes that have been done to that location - the
>terms 'cache coherence' and 'data integrity' have also been used for this
>concept).
>
>Many solutions to these problems have been implemented and/or proposed.
(Paraphrased): (1) No shared memory; (2) don't cache shared memory; 
(3) single cache location (doesn't help inter); (4) temporarily `own' cached
shared memory locations.

I don't think you mentioned (unless that was what you meant by number 4) the
widely used multiple cache synchronization technique of write-through,
either with invalidate, or with update.

Add to this the Snoopy cache techniques from Xerox which attempt to avoid
unnecessary write-throughs.

The write-through techniques all involve having the cache listen to the
system bus (at the same time as they are responding to their own processor's
requests) and updating their own entries that correspond to traffic they see
on the bus. Yes, this requires multiporting. Update according to bus traffic
requires two full write ports; invalidate means that you only have to write
one bit, which the processor's cache port only reads, so it's considerably
less expensive.

Whatever, you have to be associating on the addresses that you see both at
both ports. Consider these configurations:

    Processor A					    Processor B
	Processor A's virtual address			virtual address
	Physical address				physical address
	     |----------------Physical Memory-------------------|

There are two places you can put caches for any particular processor: 
before the virtual address gets translated, or after. 

   Processor --virtual address--> :cache: --physical address--> Physical memory
   Processor --virtual address--> --physical address--> :cache: Physical memory

If you are only dealing with one processor that doesn't even have to worry
about things like DMA, and if your virtual mapping doesn't change very often
(depends on your application mix), then there is an advantage in caching on
virtual addresses rather than physical, since you can simply start the cache
at the same time as you start your translation (using your TLB or whatever)
and you can reduce the cycle time for cache hits by that much. Ie. caching
on virtual addresses can let you go a bit faster.

When you have multiple caches, if you want to do bus monitoring, you have
three possible configurations: 

(1) PP caching on your processor's physical addresses and listening to
physical addresses on the bus:

   Processor --virtual address--> --physical address--> :cache: Physical memory
							    \___/
							    /	\
   Processor --virtual address--> --physical address--> :cache: Physical memory

(2) VP caching on your virtual processor's virtual addresses, listening to
physical addresses on the bus:

   Processor --virtual address--> :cache: --physical address--> Physical memory
				     \______/
				     /	    \
   Processor --virtual address--> :cache: --physical address--> Physical memory

(3) VV caching on your processor's virtual addresses and listening to
other processor's virtual addresses:

   Processor --virtual address--> :cache: --physical address--> Physical memory
                             \______/
	                     /	    \
   Processor --virtual address--> :cache: --physical address--> Physical memory

(The fourth possibility has no advantages).

PP what is usually done, but it has the same `problem' (whether it is a
problem depends on how much you are worried about speed) that the cache
doesn't get started until the address has been translated.

VP is alright, in that it means that you can start responding to your own
processor before translating the address, but it doubles the complexity of
your cache in that you have to be able to associate on both physical and
virtual addresses.

VV gives you faster response to your own processor by starting the cache
lookup before address translation begins. It works like this:
    Give virtual address to cache
    If a miss then
	put virtual address on cache synchronization bus
	|| put physical address on physical memory bus
	|| if a write but data on data bus
where || means operations performed in parallel.The speed advantage isn't
carried over to the bus - the advantage over VP is only that both cache
ports associate on the same thing, the virtual address - but it only works
if all processors share the same virtual address space.

---

Please, don't anyone say `who cares about the little increment of speed you
get from virtual caching?'. Caching is purely about speed - if you're not
worried about speed, don't do caching at all. 

No, sorry, that's not quite just - some people may be happy with the
increment gained from PP caching, and not need to worry about getting that
little bit faster (particularly since a good cache is faster than many of
the microprocessors it might get attached to). But if your goals are speed,
speed, and more speed, then the little bit of gain from V* caching is
tempting, but the extra size necessary for VP caching turns you off.

Andy "Krazy" Glew. Gould CSD-Urbana.    USEnet:  ihnp4!uiucdcs!ccvaxa!aglew
1101 E. University, Urbana, IL 61801    ARPAnet: aglew@gswd-vms