Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site dartvax.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!allegra!mit-eddie!genrad!decvax!dartvax!chuck
From: chuck@dartvax.UUCP (Chuck Simmons)
Newsgroups: net.arch
Subject: cache designs
Message-ID: <2571@dartvax.UUCP>
Date: Thu, 15-Nov-84 08:02:45 EST
Article-I.D.: dartvax.2571
Posted: Thu Nov 15 08:02:45 1984
Date-Received: Sat, 17-Nov-84 05:55:29 EST
Distribution: net
Organization: Dartmouth College, Hanover, NH
Lines: 52

I don't know much about architecture, so try not to be too harsh
with me as I parade my ignorance...

All the cacheing schemes I have ever heard of seem to work on a
demand paged basis.  When a piece of information is asked for,
it is sucked into the cache, replacing the least recently used
piece of data in the cache.  (Here I am not making any distinctions
between a cache used to speed up main memory and a cache used to speed
up accesses to a disk.)

One problem with this approach is that when the processor realizes
that a desired piece of data is not in cache, then the processor must
wait until the data can be read into cache.

Now it seems to me that human brains don't work this way.  Rather,
human brains are more associative.  It seems to me that within the
brain, each piece of information points to other pieces of information
which are often used in association with the first piece of information.
Thus, while part of the brain is cogitating one hunk of data, other parts
of the brain can start swapping in pieces of data that will soon be
needed.

Now suppose we had a cache that was much more under a programmer's control.
To be concrete, suppose we have a cache of say, 32 elements each containing
32 words.  And suppose our processor has a load cache instruction with
syntax:

    load cache <cache address> <memory address>

When this instruction was executed, the instruction execution processor
would quickly tell the cache processor to pick up some data from memory.
Now, while the cache processor is picking up the data, the instruction
processor continues executing instructions which are already in cache.

Using this instruction, I could imagine a very careful programmer helping
the system to obtain cache hit rates of at least 90%.  For example, at
the beginning of each 32 word chunk of code, the programmer might load
the next 32 word chunk of code into cache, a chunk of code that might
be branched to, and a few memory locations that would soon be accessed.

So maybe someone out there can tell me about interesting cache designs,
or tell me why a cache such I have described wouldn't work (the cache
processor conflicts with the instruction processor?  a compiler could
never use the load cache instruction effectively?  no programmer in
her right mind would want to bother with the instruction?).

You bring the flames, I'll bring the marshmallows.

dartvax!chuck