Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!lll-tis!ames!claris!apple!bcase From: bcase@Apple.COM (Brian Case) Newsgroups: comp.arch Subject: Re: 16 & 32 bit vs 32 bit only instruct Message-ID: <7519@apple.Apple.Com> Date: 1 Mar 88 19:17:11 GMT References: <2574@im4u.UUCP> <9728@steinmetz.steinmetz.UUCP> Reply-To: bcase@apple.UUCP (Brian Case) Organization: Ungermann-Bass Enterprises Lines: 35 In article <9728@steinmetz.steinmetz.UUCP> sunset!oconnor@steinmetz.UUCP writes: >Many operations in load-store machines are of the load-it, >modify-it, maybe modify-it-again, then maybe store-it. These >types of operations will never want three-address formats. >The original (destroyed in two-address) value is never reused. >Our research indicated that this was the most common case. >For these types of data, dependencies can't be avoided. In my experience, just the opposite is true; er, that is the opposite of "The original value is never reused" is true. Yes, it is true that many operations are like "load-it, modify, store-it-back" but reuse is, to me, one the *MAIN* benefits of RISC architectures. Marty Hopkins said it pretty well in some short papers. Lots of registers and three-address operations facilitate reuse. If having a three address format reduces the instruction (cycle) count in your inner loops from 10 to 9, you potentially have 10% better performance. If the inner loops go from 5 to 4 instructions, it's even better. Three address instructions don't have to be terribly frequently used to be very important. >] But, if you don't use them, why pay for them? Why not have a >] decoded instruction cache that takes a compact representation >] and generates the canonical form? It doesn't have to be as fancy >] as CRISP - Patterson's group had a paper on this. > >It adds latency. Especially it adds to the latency of a branch. Either >you will have more post-branch slots to fill, or you will have a more >expensive cache-miss penalty. Branches seem to be about one-tenth >of all instructions. This is the one big lose with decoded instruction caches. A smaller lose is the size; in the same area, you could have had probably a 2x size encoded instruction cache. Depending upon the actual sizes involved, the 2x size difference may not have much effect (rule of thumb: double the cache size will halve the miss rate. Sort of.). An 8K instrution cache is probably not much worse than a 16K instruction cache (depending on lots of things, or course...).