Path: utzoo!mnetor!uunet!seismo!sundc!pitstop!sun!decwrl!decvax!ucbvax!pasteur!ames!hao!gatech!bloom-beacon!mit-eddie!killer!elg
From: elg@killer.UUCP (Eric Green)
Newsgroups: comp.arch
Subject: Re: 16 & 32 bit vs 32 bit only instructions for RISC.
Message-ID: <3508@killer.UUCP>
Date: 27 Feb 88 06:18:11 GMT
References: <2574@im4u.UUCP>
Organization: Bayou Telecommunications
Lines: 57

in article <2574@im4u.UUCP>, rajiv@im4u.UUCP (Rajiv N. Patel) says:
>>All you 32-bit instruction advocates : how many of your 32-bits of
>>instruction are usually wasted ( like by leading zeroes or ones, or
>>unused register specifications ) ? If it sounds like I'd welcome a
>>debate on the merits of 16 vs 32 bit instructions : sure. Isn't that
>    of the programs I have coded (<2K instructions) have about 70-90% of the
>    instructions from the 16 bit category. This only tells me that indeed 16
>    bit instructions are very useful and the additional amount of time which
>    one may incur in decoding 16 and 32 bit instructions could be offset by
>    the time saved in fetching instructions from memory/cache assuming a 32
>    bits wide bus. The fixed instruction architectures never seem to talk
>    about the memory traffic involved for getting all the leading zeros and
>    unnecessary third register name.

The operative parameter here is, is the bus width {n:n>1) times greater than
the instruction width? If so, then it doesn't matter how large the
instructions are -- you'll always be able to fetch multiple instructions
faster than you can execute them. For example, what you mentioned -- a 32 bit
bus, with 16 bit instructions. Or a 64 bit bus, with 32 bit instructions.
Execution-time wise, it doesn't matter either way, unless there's additional
decoding overhead (such as, variable length instructions for the first, but
not for the second). 

And then there is the case of cache. Whenever you fetch an opcode out of cache
memory, you have no delays anyhow.  Someone from ?Pyramid? posted about their
architecture a long time ago. The cache is a very important part of that
machine, and is integrated with a bus that's wider than the instruction/data
width. For example, 32-bit instructions & data, with a 192-bit-wide memory
bus. Considering the locality of data, that means the next 5 instructions will
already be in cache, instantly available (nearbouts). I fail to see how this
can slow the machine down any, except for the i/o overhead for loading it into
main memory in the first place (which is not a cpu delay, but, rather, a
response time delay during which some other process is running).

So, while 16-bit variable-length instructions with a 32-bit data bus may be a
win on a machine with slow memory access time and no cache (i.e. your typical
microcomputer, at the moment), 32-bit fixed length instructions with a 64-bit
instruction-fetch memory interface will blow it into the weeds come
ultimate-performance time, because of the lack of instruction decode overhead.
All a matter of keeping the memory interface bigger than the instruction
size.... 

I also have some papers here from the original RISC guys at UC-Berkeley, and
AMD's design team, which discuss the issue at great length. Their basic
conclusion is that the locality of reference of large caches means that 32-bit
fixed length instructions are a big win even in the absence of a big memory
interface. Of course, their basic problem was justifying the larger flow of
instructions in a RISC machine as vs. a CISC machine, instead of specifically
addressing variable-length vs. fixed-length instructions, but their
conclusions still apply. At least, if you accept the basic premise of RISC,
that is. 

--
Eric Lee Green  elg@usl.CSNET     Asimov Cocktail,n., A verbal bomb
{cbosgd,ihnp4}!killer!elg              detonated by the mention of any
Snail Mail P.O. Box 92191              subject, resulting in an explosion
Lafayette, LA 70509                    of at least 5,000 words.