Path: utzoo!mnetor!uunet!husc6!mit-eddie!ll-xn!ames!sgi!bron From: bron@olympus.SGI.COM (Bron C. Nelson) Newsgroups: comp.arch Subject: Re: 16 & 32 bit vs 32 bit only instructions for RISC. Message-ID: <12705@sgi.SGI.COM> Date: 14 Mar 88 21:48:15 GMT References: <9651@steinmetz.steinmetz.UUCP> <9678@steinmetz.steinmetz.UUCP> <15580@onfcanim.UUCP> Sender: daemon@sgi.SGI.COM Organization: Silicon Graphics Inc, Mountain View, CA Lines: 36 Summary: How hard to decode really? This topic has gone on for awhile, but I haven't noticed (or managed to miss) the answer to the question I consider important: i.e. how hard/expensive is it to decode 16 & 32 bit instructions vs 32 bit only? Several respondents have said "its expensive" or "it takes more time to decode the different formats" and even "fetching the registers in parallel with doing the instruction decode is a big win." (None of these are probably exact quotes.) I sorta wonder at this. Is the instruction decode/register fetch a critical path? From what I can gather, the register fetch probably IS. Can more hardware be thrown at the problem to allow multiple formats? If not, how expensive (time wise) is it really to provide? It seems that if done "right" (oh no! not that word!) it would only add 1 gate delay (see below). This is maybe 10%? (What the heck IS the length of the critical path (in gates) of your favorite cpu?). At worst it would add another stage to the pipeline (plus the associated hardware to support an extra stage). How expensive is that? Maybe 10%?? I'm just trying to get a feel for how much cpu performance people think would have to be given up to get the more compact encoding. ----------------------------------------------------------------------- Bron Nelson bron@sgi.com Don't blame my employers for my opinions. p.s. If we only have 2 formats, we can specify which one by using the first bit in the instruction (much like CRISP uses the first bit(s)). This should (?) let us select between the 2 possible register encodings with only a single additional gate delay (and some more silicon devoted to doing it). What we buy is a 25%+ reduction in program code size. This seems like a good trade off to me since most programs I run take longer to load off disk than they do to execute. I admit my experiance may not be typical, and taking a performance hit may not be a smart marketing decision to a cpu house, but it seems like a good system tradeoff.