Path: utzoo!mnetor!uunet!steinmetz!sungoddess!oconnor
From: oconnor@sungoddess.steinmetz (Dennis M. O'Connor)
Newsgroups: comp.arch
Subject: Re: 16 & 32 bit vs 32 bit only instructions for RISC.
Message-ID: <9740@steinmetz.steinmetz.UUCP>
Date: 1 Mar 88 19:08:10 GMT
References: <2574@im4u.UUCP>
Sender: news@steinmetz.steinmetz.UUCP
Reply-To: sungoddess!oconnor@steinmetz.UUCP
Organization: GE Corporate R&D Center
Lines: 100

An article by elg@killer.UUCP (Eric Green) says:
] in article <2574@im4u.UUCP>, rajiv@im4u.UUCP (Rajiv N. Patel) says:
] >>All you 32-bit instruction advocates : how many of your 32-bits of
] >>instruction are usually wasted ( like by leading zeroes or ones, or
] >>unused register specifications ) ? If it sounds like I'd welcome a
] >>debate on the merits of 16 vs 32 bit instructions : sure. Isn't that
] 
] The operative parameter here is, is the bus width {n:n>1) times greater than
] the instruction width? If so, then it doesn't matter how large the
] instructions are -- you'll always be able to fetch multiple instructions
] faster than you can execute them. For example, what you mentioned -- a 32 bit
] bus, with 16 bit instructions. Or a 64 bit bus, with 32 bit instructions.
] Execution-time wise, it doesn't matter either way, unless there's additional
] decoding overhead (such as, variable length instructions for the first, but
] not for the second). 

The real question isn't bus WIDTH, but rather bus BANDWIDTH, usually
measured in megabytes per second. This is NOT an infinitely available
resource for single-chip CMOS processors. The package limits you to
only having a finite number of pins, and CMOS can only drive those
pins at a certain (technology dependant) rate.

The comparison, then, is between your INTERNAL execution rate,
measured in MIPS, and your available instruction-fetch bandwidth,
measured in MB/sec. The ratio of (MB/s) over MIPS yeilds the
number of BYTES/INSTRUCTION. First order, of course.
 
] And then there is the case of cache. Whenever you fetch an opcode out of cache
] memory, you have no delays anyhow.  Someone from ?Pyramid? posted about their
                   ^^^^^^^^^^^^^^^^ uh, really ? Don't you mean LESS delay ?

] architecture a long time ago. The cache is a very important part of that
] machine, and is integrated with a bus that's wider than the instruction/data
] width. For example, 32-bit instructions & data, with a 192-bit-wide memory
] bus. Considering the locality of data, that means the next 5 instructions will
] already be in cache, instantly available (nearbouts). I fail to see how this
] can slow the machine down any, except for the i/o overhead for loading it into
] main memory in the first place (which is not a cpu delay, but, rather, a
] response time delay during which some other process is running).

I'm just guessing, but it sounds like the Pyramid machine is NOT a
single chip microprocessor. Different horses for different course,
and all that. Using 192 pins on a micro for the memory bus would
be pushing package technology, I think, once you added other signals,
address bus, power and ground.

] So, while 16-bit variable-length instructions with a 32-bit data bus may be a
] win on a machine with slow memory access time and no cache (i.e. your typical
] microcomputer, at the moment), 32-bit fixed length instructions with a 64-bit
] instruction-fetch memory interface will blow it into the weeds come
] ultimate-performance time, because of the lack of instruction decode overhead.

There is NO intrinsic reason 16-bit instructions would decode slower than
32-bit instructions. In fact, they can ultimately decode FASTER :
the fewer bits your decoder has to look at, the faster it can be.
Barring other complications of course.

I think the assumption your making is that a smaller instruction set
has to be more complex to get the job done. There are plenty
of examples of this, but it's NOT an imutable law.

] All a matter of keeping the memory interface bigger than the instruction
] size.... 

Dedicating 64 pins purely to instruction fetch (assuming a Harvard
architecture) is quite a lot of a rather scarce resource. Sure
you wanna do this on a micro ?

] I also have some papers here from the original RISC guys at UC-Berkeley, and
] AMD's design team, which discuss the issue at great length. Their basic
] conclusion is that the locality of reference of large caches means that 32-bit
] fixed length instructions are a big win even in the absence of a big memory
] interface. Of course, their basic problem was justifying the larger flow of
] instructions in a RISC machine as vs. a CISC machine, instead of specifically
] addressing variable-length vs. fixed-length instructions, but their
] conclusions still apply. At least, if you accept the basic premise of RISC,
] that is. 

The appropriate measure of cache size, IMHO, is in INSTRUCTIONS.
Given you have some limited number of transistors to put into
a cache, then the smaller your instructions are, the "bigger"
your cache will be.

Also, instruction size affects several "second-order" performance
factors, like how quickly a program loads from a "low-speed" (like
disk) I/O device and how often you page-fault. This effect
is of course due to the fact that programs written in a 32-bit
RISC instruction set will be (according to our data) 65% larger
than the same program in a 16-bit RISC instruction set.

Sorry, we haven't published our data yet. It's just an analysis
(using information-theory) of existing data anyway.

] Eric Lee Green  elg@usl.CSNET     Asimov Cocktail,n., A verbal bomb


--
    Dennis O'Connor			      UUNET!steinmetz!sunset!oconnor
		   ARPA: OCONNORDM@ge-crd.arpa
   (-: The Few, The Proud, The Architects of the RPM40 40MIPS CMOS Micro :-)