Path: utzoo!mnetor!uunet!steinmetz!sungoddess!oconnor From: oconnor@sungoddess.steinmetz (Dennis M. O'Connor) Newsgroups: comp.arch Subject: Re: 16 & 32 bit vs 32 bit only instructions for RISC. Message-ID: <9740@steinmetz.steinmetz.UUCP> Date: 1 Mar 88 19:08:10 GMT References: <2574@im4u.UUCP> Sender: news@steinmetz.steinmetz.UUCP Reply-To: sungoddess!oconnor@steinmetz.UUCP Organization: GE Corporate R&D Center Lines: 100 An article by elg@killer.UUCP (Eric Green) says: ] in article <2574@im4u.UUCP>, rajiv@im4u.UUCP (Rajiv N. Patel) says: ] >>All you 32-bit instruction advocates : how many of your 32-bits of ] >>instruction are usually wasted ( like by leading zeroes or ones, or ] >>unused register specifications ) ? If it sounds like I'd welcome a ] >>debate on the merits of 16 vs 32 bit instructions : sure. Isn't that ] ] The operative parameter here is, is the bus width {n:n>1) times greater than ] the instruction width? If so, then it doesn't matter how large the ] instructions are -- you'll always be able to fetch multiple instructions ] faster than you can execute them. For example, what you mentioned -- a 32 bit ] bus, with 16 bit instructions. Or a 64 bit bus, with 32 bit instructions. ] Execution-time wise, it doesn't matter either way, unless there's additional ] decoding overhead (such as, variable length instructions for the first, but ] not for the second). The real question isn't bus WIDTH, but rather bus BANDWIDTH, usually measured in megabytes per second. This is NOT an infinitely available resource for single-chip CMOS processors. The package limits you to only having a finite number of pins, and CMOS can only drive those pins at a certain (technology dependant) rate. The comparison, then, is between your INTERNAL execution rate, measured in MIPS, and your available instruction-fetch bandwidth, measured in MB/sec. The ratio of (MB/s) over MIPS yeilds the number of BYTES/INSTRUCTION. First order, of course. ] And then there is the case of cache. Whenever you fetch an opcode out of cache ] memory, you have no delays anyhow. Someone from ?Pyramid? posted about their ^^^^^^^^^^^^^^^^ uh, really ? Don't you mean LESS delay ? ] architecture a long time ago. The cache is a very important part of that ] machine, and is integrated with a bus that's wider than the instruction/data ] width. For example, 32-bit instructions & data, with a 192-bit-wide memory ] bus. Considering the locality of data, that means the next 5 instructions will ] already be in cache, instantly available (nearbouts). I fail to see how this ] can slow the machine down any, except for the i/o overhead for loading it into ] main memory in the first place (which is not a cpu delay, but, rather, a ] response time delay during which some other process is running). I'm just guessing, but it sounds like the Pyramid machine is NOT a single chip microprocessor. Different horses for different course, and all that. Using 192 pins on a micro for the memory bus would be pushing package technology, I think, once you added other signals, address bus, power and ground. ] So, while 16-bit variable-length instructions with a 32-bit data bus may be a ] win on a machine with slow memory access time and no cache (i.e. your typical ] microcomputer, at the moment), 32-bit fixed length instructions with a 64-bit ] instruction-fetch memory interface will blow it into the weeds come ] ultimate-performance time, because of the lack of instruction decode overhead. There is NO intrinsic reason 16-bit instructions would decode slower than 32-bit instructions. In fact, they can ultimately decode FASTER : the fewer bits your decoder has to look at, the faster it can be. Barring other complications of course. I think the assumption your making is that a smaller instruction set has to be more complex to get the job done. There are plenty of examples of this, but it's NOT an imutable law. ] All a matter of keeping the memory interface bigger than the instruction ] size.... Dedicating 64 pins purely to instruction fetch (assuming a Harvard architecture) is quite a lot of a rather scarce resource. Sure you wanna do this on a micro ? ] I also have some papers here from the original RISC guys at UC-Berkeley, and ] AMD's design team, which discuss the issue at great length. Their basic ] conclusion is that the locality of reference of large caches means that 32-bit ] fixed length instructions are a big win even in the absence of a big memory ] interface. Of course, their basic problem was justifying the larger flow of ] instructions in a RISC machine as vs. a CISC machine, instead of specifically ] addressing variable-length vs. fixed-length instructions, but their ] conclusions still apply. At least, if you accept the basic premise of RISC, ] that is. The appropriate measure of cache size, IMHO, is in INSTRUCTIONS. Given you have some limited number of transistors to put into a cache, then the smaller your instructions are, the "bigger" your cache will be. Also, instruction size affects several "second-order" performance factors, like how quickly a program loads from a "low-speed" (like disk) I/O device and how often you page-fault. This effect is of course due to the fact that programs written in a 32-bit RISC instruction set will be (according to our data) 65% larger than the same program in a 16-bit RISC instruction set. Sorry, we haven't published our data yet. It's just an analysis (using information-theory) of existing data anyway. ] Eric Lee Green elg@usl.CSNET Asimov Cocktail,n., A verbal bomb -- Dennis O'Connor UUNET!steinmetz!sunset!oconnor ARPA: OCONNORDM@ge-crd.arpa (-: The Few, The Proud, The Architects of the RPM40 40MIPS CMOS Micro :-)