Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watmath!clyde!rutgers!labrea!decwrl!pyramid!voder!apple!baum
From: baum@apple.UUCP
Newsgroups: comp.arch
Subject: Re: What should be in hardware but isn't
Message-ID: <6336@apple.UUCP>
Date: Thu, 24-Sep-87 12:48:14 EDT
Article-I.D.: apple.6336
Posted: Thu Sep 24 12:48:14 1987
Date-Received: Sat, 26-Sep-87 13:43:23 EDT
References: <581@l.cc.purdue.edu> <18336@amdcad.AMD.COM> <582@l.cc.purdue.edu>
Reply-To: baum@apple.UUCP (Allen Baum)
Organization: Apple Computer, Inc.
Lines: 63

--------
[]
>In article <582@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>
>Olson greatly underestimates the number of RISC instructions needed to do
>even a fair job.

I don't think that Olson is underestimating anything. Most RISC architectures
have a divide step instruction, which is precisely what underlying microcode
would use. Furthermore, in order to get signed/unsigned variations, microcode
has to do the same kinds of conditional operations that a RISC would have to
do. It is is mistake to assume that a RISC would be slower to do these than
a microcoed engine; some RISC machines (Acorn ARM, HP Spectrum) have support
for conditional operations. Furthermore, any hardware support in excess of this
will inevitably slow the basic cycle down (I've been through the exercise).

> I have not seen any remotely efficient bit-handling hardware on any machine.

Check out the HP Spectrum.

>  To do unsigned
>multiplication with only signed multiplication available requires that
>2 conditional additions must be done after the multiplication; as machines
>get faster conditional operations are bad except in nanocode.  Unsigned
>division is so complicated that one introduces other inefficiencies instead.

Again, you make the mistake of believing that for some reason
nanocode is somehow magically faster or more efficient than a well
designed instruction set. Wrong. Microcode, or nanocode, has to go
through all the same operations that assembly level code does. While
special purpose data paths can be included to make the sign
correction run faster, it is just that: special purpose. It can't be
used for anything else, it may have the effect of making everything
else run slower, and making division run a cycle or two faster will
have no noticable effect on performance. Its VERY difficult to make
fixed point division run faster than a bit per cycle, without a LOT
of hardware. By leaving out the special purpose speedup stuff, you can afford
to include some VERY useful general purpose speedup stuff: More registers,
perhaps, or branch folding logic ala the ATT CRISP.

>
>BTW, there is an address modification procedure which is missing on all
>machines I have seen except the UNIVAC's.  That is to consider the register
>file as a memory block and allow indexing on it.  Another missing procedure
>is to enable the register file to be treated as a block of memory so that
>bytes or short words can be addressed.  These two operations can be combined
>on a byte-addressable machine.

The original PDP-10 from DEC allowed that, originally because
registers were real expensive, so that hardware registers were an
expensive (but effective) speedup option; otherwise, they went to
real memory. Registers were the first  16 locations in memory. This
came back to bite them in the later KL models, because instructions could
put into the registers and executed from them. While this was a real 
speedup hack on the older models, it slowed down the newer ones.

The ATT CRISP doesn't have any registers. But, by caching the top of
the local frame, references to locals are effectively turned into
register references, and you get register windows as well. You can
index into these 'registers', byte access them, and reference them
with short 5-bit fields in the instruction.

--
{decwrl,hplabs,ihnp4}!nsc!apple!baum		(408)973-3385