Path: utzoo!attcan!uunet!lll-winken!ames!ames.arc.nasa.gov!lamaster
From: lamaster@ames.arc.nasa.gov (Hugh LaMaster)
Newsgroups: comp.arch
Subject: Re: Intel/MIPS Dhrystone ratio
Message-ID: <22839@ames.arc.nasa.gov>
Date: 17 Mar 89 21:12:24 GMT
References: <1552@vicom.COM> <15690@cup.portal.com> <1562@vicom.COM> <37196@bbn.COM> <1989Mar16.190043.23227@utzoo.uucp> <24889@amdcad.AMD.COM>
Sender: usenet@ames.arc.nasa.gov
Organization: NASA - Ames Research Center
Lines: 53

In article <24889@amdcad.AMD.COM> tim@amd.com (Tim Olson) writes:
>In article <1989Mar16.190043.23227@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>| In article <37196@bbn.COM> slackey@BBN.COM (Stan Lackey) writes:
>Also, auto-incrementing addressing modes imply:
>	- Another adder (to increment the address register in parallel)
>	- Another writeback port to the register file
>Unless you wish to sequence the instruction over multiple cycles :-(
>I'm certain that most people can find something better to do with these
>resources than auto-increment. 

I neither agree nor disagree with this.  But, I think it should be noted
that auto-increment/decrement addressing modes can easily be generated by
compilers and are parallelizable in hardware, and are therefore potential 
performance wins, although in practice it may not work out.  I am sure 
people have simulated these questions to death, and examined the various
possibilities for code sequences.  You can increment on compare and branch
also (e.g. IBM BXLE).  Can you fill the delay slot in a branch if you have
already incremented, etc?  Detailed simulations using a lot of different
kinds of source code are needed to determine questions like this.

Anyway, this is a different situation 
from the alignment problem below, since the performance loss for doing
unaligned data accesses is significant, the hardware designers tell us.
Anyway, it is a separate performance hit from the usual RISC/CISC issues.
>| As for hardware handling of unaligned data, this is purely a concession

**************

The reason that the VAX (and a few other) architectures are hard to
pipeline is that the operand specifiers require a separate decode,
and that a variable number of operands may come from memory,
not because the machine has autoincrement/decrement addressing modes.


But, really the issue is not "complexity" (usually in the eye of the
beholder anyway) but ease of pipelining (a lot easier to measure).
The VAX (always the straw man in any RISC debate) achieves its design goals of:
" 1) all instructions should have the 'natural' number of operands and
  2) all operands should have the same generality in specification. "
(see Strecker's paper in Sieworek, Bell, and Newell).  
It just so happened that these design goals, which produce a small number of 
very compact instructions (and thus overcome the problem of "most architectures"
as stated in the paper) for a given piece of source code, were the wrong
goals to pursue if another goal is PERFORMANCE.  OK, so they bet wrong on
the VAX...they bet that instruction compactness was very important.  Almost
immediately, they began to be proved wrong.  On the other hand, ten years 
years, and billions of dollars of sales went by before the noise got to be
too loud, so, ...

  Hugh LaMaster, m/s 233-9,  UUCP ames!lamaster
  NASA Ames Research Center  ARPA lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035     
  Phone:  (415)694-6117