Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!lll-tis!ames!claris!apple!bcase
From: bcase@Apple.COM (Brian Case)
Newsgroups: comp.arch
Subject: Re: 16 & 32 bit vs 32 bit only instruct
Message-ID: <7519@apple.Apple.Com>
Date: 1 Mar 88 19:17:11 GMT
References: <2574@im4u.UUCP> <9728@steinmetz.steinmetz.UUCP>
Reply-To: bcase@apple.UUCP (Brian Case)
Organization: Ungermann-Bass Enterprises
Lines: 35

In article <9728@steinmetz.steinmetz.UUCP> sunset!oconnor@steinmetz.UUCP writes:
>Many operations in load-store machines are of the load-it,
>modify-it, maybe modify-it-again, then maybe store-it. These
>types of operations will never want three-address formats.
>The original (destroyed in two-address) value is never reused.
>Our research indicated that this was the most common case.
>For these types of data, dependencies can't be avoided.

In my experience, just the opposite is true; er, that is the opposite of
"The original value is never reused" is true.  Yes, it is true that many
operations are like "load-it, modify, store-it-back" but reuse is, to me,
one the *MAIN* benefits of RISC architectures.  Marty Hopkins said it
pretty well in some short papers.  Lots of registers and three-address
operations facilitate reuse.  If having a three address format reduces
the instruction (cycle) count in your inner loops from 10 to 9, you
potentially have 10% better performance.  If the inner loops go from 5
to 4 instructions, it's even better.  Three address instructions don't
have to be terribly frequently used to be very important.

>] But, if you don't use them, why pay for them? Why not have a
>] decoded instruction cache that takes a compact representation
>] and generates the canonical form? It doesn't have to be as fancy
>] as CRISP - Patterson's group had a paper on this.
>
>It adds latency. Especially it adds to the latency of a branch. Either
>you will have more post-branch slots to fill, or you will have a more
>expensive cache-miss penalty. Branches seem to be about one-tenth
>of all instructions.

This is the one big lose with decoded instruction caches.  A smaller lose
is the size; in the same area, you could have had probably a 2x size
encoded instruction cache.  Depending upon the actual sizes involved, the
2x size difference may not have much effect (rule of thumb:  double the
cache size will halve the miss rate.  Sort of.).  An 8K instrution cache
is probably not much worse than a 16K instruction cache (depending on
lots of things, or course...).