Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!ames!nrl-cmf!cmcl2!brl-adm!umd5!uvaarpa!mcnc!decvax!decwrl!hplabs!hpda!hpcupt1!viggy
From: viggy@hpcupt1.HP.COM (Viggy Mokkarala)
Newsgroups: comp.arch
Subject: Re: Re: Auto-shifted registers (and ease of compiler writing)
Message-ID: <6310007@hpcupt1.HP.COM>
Date: 1 Mar 88 17:47:18 GMT
References: <1390@vaxb.calgary.UUCP>
Organization: Hewlett Packard, Cupertino
Lines: 42


radford@calgary.UUCP (Radford Neal) writes:

>In article <6310005@hpcupt1.HP.COM>, viggy@hpcupt1.HP.COM (Viggy Mokkarala) writes:

>> The HP Precision Architecture provides for these kinds of operations by its
>> Shift and Add instructions.  There is a pre-shifter before one of the inputs to
>> the CPU.  It allows for one of the operands to be pre-shifted by upto 3 bits
>> before an addition happens. 

>Does this cost you anything (other than chip area), or does the shift
>overlap another operation? If it does overlap, is there any reason
>not to allow it for all relevant instructions (e.g. and and or)?
>Is there any reason to keep the unshifted add (given a shift of zero 
>is possible)?

>    Radford Neal

Sorry about the previous failed posting.

The pre-shifter before one of the CPU inputs is the same shifter that is used
for the "indexed loads" of the HP Precision Arch.  The index register (which
can be any of the general registers), may be optionally shifted left by 1, 2,
or 3 bits so that integer addressing to half words, words, or double words is
possible.

Therefore, it turns out that the "shift" operation in the shift and add
instructions, which were included as primitives for integer multiplication,
comes for free.  Instructions such as Shift and And,or Shift and Or are not
frequently used instructions, and weren't considered.  There isn't a Shift Zero
and Add instruction (it is simplly called the Add instruction, to keep the
instruction names simple I guess :-) ).

In the HP 9000/840 (the first HPPA product - off the shelf TTL parts), the
pre-shift operation takes place in the beginning of the execute phase (the pipe
is 3 stages deep, and the execute phase is the second one).  The pre-shift
operation takes 6-7 nsec and happens in parallel with the immediate generation.
All register operands do pass through the pre-shifter and only some indexed
loads, and the shift and add instructions get shifted by more than zero.
In this implementation, it required about 8 or 9 packs extra to do this.

Viggy Mokkarala, Hewlett Packard Co., Cupertino, CA.
(hpda!viggy)