Path: utzoo!mnetor!uunet!seismo!sundc!pitstop!sun!decwrl!decvax!ucbvax!pasteur!ames!hao!gatech!bloom-beacon!mit-eddie!killer!elg From: elg@killer.UUCP (Eric Green) Newsgroups: comp.arch Subject: Re: 16 & 32 bit vs 32 bit only instructions for RISC. Message-ID: <3508@killer.UUCP> Date: 27 Feb 88 06:18:11 GMT References: <2574@im4u.UUCP> Organization: Bayou Telecommunications Lines: 57 in article <2574@im4u.UUCP>, rajiv@im4u.UUCP (Rajiv N. Patel) says: >>All you 32-bit instruction advocates : how many of your 32-bits of >>instruction are usually wasted ( like by leading zeroes or ones, or >>unused register specifications ) ? If it sounds like I'd welcome a >>debate on the merits of 16 vs 32 bit instructions : sure. Isn't that > of the programs I have coded (<2K instructions) have about 70-90% of the > instructions from the 16 bit category. This only tells me that indeed 16 > bit instructions are very useful and the additional amount of time which > one may incur in decoding 16 and 32 bit instructions could be offset by > the time saved in fetching instructions from memory/cache assuming a 32 > bits wide bus. The fixed instruction architectures never seem to talk > about the memory traffic involved for getting all the leading zeros and > unnecessary third register name. The operative parameter here is, is the bus width {n:n>1) times greater than the instruction width? If so, then it doesn't matter how large the instructions are -- you'll always be able to fetch multiple instructions faster than you can execute them. For example, what you mentioned -- a 32 bit bus, with 16 bit instructions. Or a 64 bit bus, with 32 bit instructions. Execution-time wise, it doesn't matter either way, unless there's additional decoding overhead (such as, variable length instructions for the first, but not for the second). And then there is the case of cache. Whenever you fetch an opcode out of cache memory, you have no delays anyhow. Someone from ?Pyramid? posted about their architecture a long time ago. The cache is a very important part of that machine, and is integrated with a bus that's wider than the instruction/data width. For example, 32-bit instructions & data, with a 192-bit-wide memory bus. Considering the locality of data, that means the next 5 instructions will already be in cache, instantly available (nearbouts). I fail to see how this can slow the machine down any, except for the i/o overhead for loading it into main memory in the first place (which is not a cpu delay, but, rather, a response time delay during which some other process is running). So, while 16-bit variable-length instructions with a 32-bit data bus may be a win on a machine with slow memory access time and no cache (i.e. your typical microcomputer, at the moment), 32-bit fixed length instructions with a 64-bit instruction-fetch memory interface will blow it into the weeds come ultimate-performance time, because of the lack of instruction decode overhead. All a matter of keeping the memory interface bigger than the instruction size.... I also have some papers here from the original RISC guys at UC-Berkeley, and AMD's design team, which discuss the issue at great length. Their basic conclusion is that the locality of reference of large caches means that 32-bit fixed length instructions are a big win even in the absence of a big memory interface. Of course, their basic problem was justifying the larger flow of instructions in a RISC machine as vs. a CISC machine, instead of specifically addressing variable-length vs. fixed-length instructions, but their conclusions still apply. At least, if you accept the basic premise of RISC, that is. -- Eric Lee Green elg@usl.CSNET Asimov Cocktail,n., A verbal bomb {cbosgd,ihnp4}!killer!elg detonated by the mention of any Snail Mail P.O. Box 92191 subject, resulting in an explosion Lafayette, LA 70509 of at least 5,000 words.