Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!mips!zalman
From: zalman@mips.com (Zalman Stern)
Newsgroups: comp.arch
Subject: Re: Compilers and efficiency
Message-ID: <3409@spim.mips.COM>
Date: 11 May 91 11:37:44 GMT
References: <653@ctycal.UUCP> <12054@mentor.cc.purdue.edu> <7738@auspex.auspex.com>
Sender: news@mips.COM
Organization: MIPS Computer Systems, Sunnyvale, California
Lines: 48
Nntp-Posting-Host: dish.mips.com

In article <7738@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:
[...]
>Now, there are other places where finding the distance to the next 1 in
>a bit stream is useful.  It may well be that it's worth providing some
>kind of hardware assist for it; I'm curious what forms of "hardware
>assist" of that sort exist (other than doing it using the obvious
>simple-minded loop, but in microcode).

The IBM RS/6000 has a single cycle count leading zeros instruction. The
hardware is also used by the multiplier to short circuit multiplies by
smaller constants. The operation takes from 3 to 5 cycles depending on the
size of the multiplier. The clz hardware is used to decide how much work it
has to do.

>
>Some questions then are "what other useful hardware would you have to
>sacrifice, in some given implementation, to add the hardware assist for
>'find next 1'?" and "will we, at some point, not have to sacrifice
>anything really useful to get it?" so that you might want to put such an
>operation into the instruction set anyway, and have a non-assisted
>implementation early on.

A Count leading zeros (or ffs if you prefer) instruction is not a big deal
other than that you have to design the hardware to do it. (I.e. it doesn't
place unreasoanble demands on the register file, it doesn't eat lots of
opcode space.) However its not clear that this single instruction does the
job. You can either look for a 1 or a 0 bit and you can look from most
significant to least significant or vice-versa. Herman probably wants all
of these options...

>
>(Is 'find next 1' similar enough to floating-point normalization that
>the same hardware assistance can be used for both?"

Putting it in the FP unit makes it harder to uses the result for shifts and
such on machines with seperate FP and integer units. (Hooking the integer
register file up to the FP normalization hardware is going to be very
painful.)

My guess is that special purpose hardware would be useful for this sort of
operation. DMA it in at bus speed and write the values into scratch RAM.
Interrupt the processor when the scratch RAM gets full or the buffer gets
empty. (Or DMA the scratch RAM back out too...) Go ahead and toss a Xlinx
or something into the hardware to make it programable.
-- 
Zalman Stern, MIPS Computer Systems, 928 E. Arques 1-03, Sunnyvale, CA 94088
zalman@mips.com OR {ames,decwrl,prls,pyramid}!mips!zalman     (408) 524 8395
  "Never rub another man's rhubarb" -- the Joker via Pop Will Eat Itself