Path: utzoo!mnetor!uunet!husc6!cmcl2!nrl-cmf!mailrus!tut.cis.ohio-state.edu!bloom-beacon!mit-eddie!bbn!rochester!cornell!batcomputer!itsgw!imagine!pawl22.pawl.rpi.edu!jesup
From: jesup@pawl22.pawl.rpi.edu (Randell E. Jesup)
Newsgroups: comp.arch
Subject: Re: 16 & 32 bit vs 32 bit only instructions for RISC.
Message-ID: <485@imagine.PAWL.RPI.EDU>
Date: 7 Mar 88 06:46:12 GMT
References: <2574@im4u.UUCP> <9740@steinmetz.steinmetz.UUCP> <7538@apple.Apple.Com> <1757@mips.mips.COM>
Sender: news@imagine.PAWL.RPI.EDU
Reply-To: beowulf!lunge!jesup@steinmetz.UUCP
Organization: RPI Public Access Workstation Lab - Troy, NY
Lines: 68

In article <1757@mips.mips.COM> hansen@mips.COM (Craig Hansen) writes:
> Instruction bandwidth is
>important, but not so important that you should go back to compacted
>instructions. 32-bit instructions aren't much larger than 16-bit
>instructions, particularly when a register-allocating compiler is
>used, and the benefit to permitting parallel decoding of instructions
>with register fetching is a tremendous win.

	Who said we were going back to compacted instructions to get to 16
bits?  What we have is a lot LESS instructions, and a minimal set of
formats for instructions.  With 32 bit instructions, we could have had less
formats (2 or 3 instead of 5 or 6), but since we can do a decode in a
single pipe-stage, what does it matter?  The decoder does not determine
the critical path and cycle time, the ALU does.  If the decoder had slowed
us up, we would have made it faster and/or reduced the number of formats.
(Keeping the number of formats down and alignment of fields did play a role
in our architecture design.)

	I don't see how having a register allocating compiler affects
instruction size.

	Concerning parallel decode with register fetch, is this anything
unusual?  Our pipeline looks like this:

	<IF> Instruction fetch - doesn't really exist per se.
	<ID> Instruction Decode/register fetch
	<ALU> ALU operation
	<WB> WriteBack - write Alu result to register file
	[ I'm ignoring the extra load stages here ]

I don't see what we're losing here.

>Generally, optimized MIPS code about 10% to 50% larger than "optimized"
>VAX code, as generated by 4.3 UNIX, and is often equal or smaller in
>size than optimized 68k code, as generated by Sun compilers.

[ many figures deleted ]

>...those big 32-bit instructions don't look so bad next to
>the machines design for compact encodings...

	68020? compact?  Surely you jest!  :-)  I know, it actually is fairly
compact, at least the 68000 part of it.  It just has SO many instructions and
addressing modes, it ends up larger than one would suspect.

	The proper comparison is not to CISCs, but to a 16-bit version of the
same general architecture, or at least the same class (RISCs).

	I agree that if cost is no object, a 32-bit RISC can probably run
faster (effective throughput (VIPS), not MIPS) than a 16-bit.  However, the
costs mentioned include a higher-bandwidth bus, more disk space for code,
more memory space for code, larger (expensive) caches, more power draw (more
pins being driven), etc, etc.  The typical current solution for the
bus bandwidth problem is to throw MUCH bigger caches onto the CPU board, to
try to increase hit rates, and reduce bandwidth required of the bus.

>Craig Hansen
>Manager, Architecture Development
>MIPS Computer Systems, Inc.

Glad to see in in the conversation.  I'm interested in hearing your opinions.

     //	Randell Jesup			      Lunge Software Development
    //	Dedicated Amiga Programmer            13 Frear Ave, Troy, NY 12180
 \\//	beowulf!lunge!jesup@steinmetz.UUCP    (518) 272-2942
  \/    (uunet!steinmetz!beowulf!lunge!jesup) BIX: rjesup

(-: The Few, The Proud, The Architects of the RPM40 40MIPS CMOS Micro :-)