Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!mips!zalman From: zalman@mips.com (Zalman Stern) Newsgroups: comp.arch Subject: Re: Compilers and efficiency Message-ID: <3409@spim.mips.COM> Date: 11 May 91 11:37:44 GMT References: <653@ctycal.UUCP> <12054@mentor.cc.purdue.edu> <7738@auspex.auspex.com> Sender: news@mips.COM Organization: MIPS Computer Systems, Sunnyvale, California Lines: 48 Nntp-Posting-Host: dish.mips.com In article <7738@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: [...] >Now, there are other places where finding the distance to the next 1 in >a bit stream is useful. It may well be that it's worth providing some >kind of hardware assist for it; I'm curious what forms of "hardware >assist" of that sort exist (other than doing it using the obvious >simple-minded loop, but in microcode). The IBM RS/6000 has a single cycle count leading zeros instruction. The hardware is also used by the multiplier to short circuit multiplies by smaller constants. The operation takes from 3 to 5 cycles depending on the size of the multiplier. The clz hardware is used to decide how much work it has to do. > >Some questions then are "what other useful hardware would you have to >sacrifice, in some given implementation, to add the hardware assist for >'find next 1'?" and "will we, at some point, not have to sacrifice >anything really useful to get it?" so that you might want to put such an >operation into the instruction set anyway, and have a non-assisted >implementation early on. A Count leading zeros (or ffs if you prefer) instruction is not a big deal other than that you have to design the hardware to do it. (I.e. it doesn't place unreasoanble demands on the register file, it doesn't eat lots of opcode space.) However its not clear that this single instruction does the job. You can either look for a 1 or a 0 bit and you can look from most significant to least significant or vice-versa. Herman probably wants all of these options... > >(Is 'find next 1' similar enough to floating-point normalization that >the same hardware assistance can be used for both?" Putting it in the FP unit makes it harder to uses the result for shifts and such on machines with seperate FP and integer units. (Hooking the integer register file up to the FP normalization hardware is going to be very painful.) My guess is that special purpose hardware would be useful for this sort of operation. DMA it in at bus speed and write the values into scratch RAM. Interrupt the processor when the scratch RAM gets full or the buffer gets empty. (Or DMA the scratch RAM back out too...) Go ahead and toss a Xlinx or something into the hardware to make it programable. -- Zalman Stern, MIPS Computer Systems, 928 E. Arques 1-03, Sunnyvale, CA 94088 zalman@mips.com OR {ames,decwrl,prls,pyramid}!mips!zalman (408) 524 8395 "Never rub another man's rhubarb" -- the Joker via Pop Will Eat Itself