Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.1 6/24/83; site ucbcad.UUCP
Path: utzoo!linus!decvax!tektronix!ucbcad!ucbesvax.turner
From: ucbesvax.turner@ucbcad.UUCP
Newsgroups: net.arch
Subject: uP caches, cont'd. - (nf)
Message-ID: <1041@ucbcad.UUCP>
Date: Thu, 15-Dec-83 01:17:17 EST
Article-I.D.: ucbcad.1041
Posted: Thu Dec 15 01:17:17 1983
Date-Received: Sun, 11-Dec-83 01:07:13 EST
Sender: notes@ucbcad.UUCP
Organization: UC Berkeley CAD Group
Lines: 51

#N:ucbesvax:27900003:000:2645
ucbesvax!turner    Dec  8 12:23:00 1983

I don't like the idea of putting registers in an on-board cache memory
(and then translating register references to full memory addresses).
Some reasons why:

- it increases the amount of control logic required to interpret a
  register reference.  One must not only extract the reference, but
  add it to a full-address-space pointer, and hand it through the
  the cache-address translator.  As we will see below, this might
  involve serializing register access--involving yet more control
  logic.

- one advantage of a true register file is that one can use dual-ported
  memory to gain speed by allowing overlapped fetches.  Making a whole
  cache (~256..~4K bytes) out of dual-ported memory would be rather
  expensive.  The only other way to achieve overlapped fetching in
  the cache would be to interleave the cache RAM--and that's only a
  statistical speed-up.  There will still be cases where register access
  must be serialized *unless* the interleave factor is equal to the number
  of registers.  This seems like a high cost to pay just to get register-
  to-register operations that are (nearly) as fast as they are in
  processors that don't map registers to memory.  One does NOT contort the
  design of a cache around the architecture!  In fact, I am in favor of
  quite the opposite, for the special case of single-chip microprocessors:
  violate the rule of transparency to the extent of adding instructions that
  address issues of control and optimization of caches, then contort the
  compiler (somewhat) around these instructions.

- runaway pointers can trash your whole context, making it very hard
  to debug programs with that problem.  Sure you could trap such
  accesses if they were inappropriate.  But again, that means clapping
  on some special frob to test for indirect addressing of register-
  mapped memory.  With a special supervisor control bit, perhaps, so
  that one *can* do it when one wants to.  And a partridge in a pear
  tree.  It all adds up.

Assuming that this discussion is concerned ONLY with the kind of cache
one puts on a single-chip microprocessor, I think people should realize
that you don't just say "oh, and let's add this".  On a chip, everything
steals something from everything else.  (In a TTL design, maybe you just
have to beef up the power supply a little to add new features.  Eventually
you run out of board space.  Bare silicon is a rather different medium.)

Don't let VLSI and its small packages fool you.  To take full advantage
of a million transistors on one die is going to be at least as hard as
designing Cray machines.
---
Michael Turner (ucbvax!ucbesvax.turner)