Path: utzoo!mnetor!uunet!husc6!mit-eddie!ll-xn!ames!sgi!bron
From: bron@olympus.SGI.COM (Bron C. Nelson)
Newsgroups: comp.arch
Subject: Re: 16 & 32 bit vs 32 bit only instructions for RISC.
Message-ID: <12705@sgi.SGI.COM>
Date: 14 Mar 88 21:48:15 GMT
References: <9651@steinmetz.steinmetz.UUCP> <9678@steinmetz.steinmetz.UUCP> <15580@onfcanim.UUCP>
Sender: daemon@sgi.SGI.COM
Organization: Silicon Graphics Inc, Mountain View, CA
Lines: 36
Summary: How hard to decode really?


This topic has gone on for awhile, but I haven't noticed (or managed
to miss) the answer to the question I consider important: i.e. how
hard/expensive is it to decode 16 & 32 bit instructions vs 32 bit
only?

Several respondents have said "its expensive" or "it takes more time
to decode the different formats"  and even "fetching the registers in
parallel with doing the instruction decode is a big win."  (None of
these are probably exact quotes.)  I sorta wonder at this.  Is the
instruction decode/register fetch a critical path?   From what I
can gather, the register fetch probably IS.  Can more hardware be
thrown at the problem to allow multiple formats?  If not, how
expensive (time wise) is it really to provide?  It seems that if done
"right" (oh no! not that word!) it would only add 1 gate delay
(see below).  This is maybe 10%?  (What the heck IS the length of the
critical path (in gates) of your favorite cpu?).  At worst it would
add another stage to the pipeline (plus the associated hardware to
support an extra stage).  How expensive is that?  Maybe 10%??

I'm just trying to get a feel for how much cpu performance people think
would have to be given up to get the more compact encoding.
-----------------------------------------------------------------------
Bron Nelson   bron@sgi.com
Don't blame my employers for my opinions.

p.s.  If we only have 2 formats, we can specify which one by using the
first bit in the instruction (much like CRISP uses the first bit(s)).
This should (?) let us select between the 2 possible register
encodings with only a single additional gate delay (and some more
silicon devoted to doing it).  What we buy is a 25%+ reduction in
program code size.  This seems like a good trade off to me since most
programs I run take longer to load off disk than they do to execute.
I admit my experiance may not be typical, and taking a performance hit
may not be a smart marketing decision to a cpu house, but it seems
like a good system tradeoff.