Path: utzoo!attcan!uunet!lll-winken!ames!ames.arc.nasa.gov!lamaster From: lamaster@ames.arc.nasa.gov (Hugh LaMaster) Newsgroups: comp.arch Subject: Re: Intel/MIPS Dhrystone ratio Message-ID: <22839@ames.arc.nasa.gov> Date: 17 Mar 89 21:12:24 GMT References: <1552@vicom.COM> <15690@cup.portal.com> <1562@vicom.COM> <37196@bbn.COM> <1989Mar16.190043.23227@utzoo.uucp> <24889@amdcad.AMD.COM> Sender: usenet@ames.arc.nasa.gov Organization: NASA - Ames Research Center Lines: 53 In article <24889@amdcad.AMD.COM> tim@amd.com (Tim Olson) writes: >In article <1989Mar16.190043.23227@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >| In article <37196@bbn.COM> slackey@BBN.COM (Stan Lackey) writes: >Also, auto-incrementing addressing modes imply: > - Another adder (to increment the address register in parallel) > - Another writeback port to the register file >Unless you wish to sequence the instruction over multiple cycles :-( >I'm certain that most people can find something better to do with these >resources than auto-increment. I neither agree nor disagree with this. But, I think it should be noted that auto-increment/decrement addressing modes can easily be generated by compilers and are parallelizable in hardware, and are therefore potential performance wins, although in practice it may not work out. I am sure people have simulated these questions to death, and examined the various possibilities for code sequences. You can increment on compare and branch also (e.g. IBM BXLE). Can you fill the delay slot in a branch if you have already incremented, etc? Detailed simulations using a lot of different kinds of source code are needed to determine questions like this. Anyway, this is a different situation from the alignment problem below, since the performance loss for doing unaligned data accesses is significant, the hardware designers tell us. Anyway, it is a separate performance hit from the usual RISC/CISC issues. >| As for hardware handling of unaligned data, this is purely a concession ************** The reason that the VAX (and a few other) architectures are hard to pipeline is that the operand specifiers require a separate decode, and that a variable number of operands may come from memory, not because the machine has autoincrement/decrement addressing modes. But, really the issue is not "complexity" (usually in the eye of the beholder anyway) but ease of pipelining (a lot easier to measure). The VAX (always the straw man in any RISC debate) achieves its design goals of: " 1) all instructions should have the 'natural' number of operands and 2) all operands should have the same generality in specification. " (see Strecker's paper in Sieworek, Bell, and Newell). It just so happened that these design goals, which produce a small number of very compact instructions (and thus overcome the problem of "most architectures" as stated in the paper) for a given piece of source code, were the wrong goals to pursue if another goal is PERFORMANCE. OK, so they bet wrong on the VAX...they bet that instruction compactness was very important. Almost immediately, they began to be proved wrong. On the other hand, ten years years, and billions of dollars of sales went by before the noise got to be too loud, so, ... Hugh LaMaster, m/s 233-9, UUCP ames!lamaster NASA Ames Research Center ARPA lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 Phone: (415)694-6117