Path: utzoo!utgpu!attcan!uunet!seismo!sundc!pitstop!sun!decwrl!purdue!i.cc.purdue.edu!k.cc.purdue.edu!l.cc.purdue.edu!cik
From: cik@l.cc.purdue.edu (Herman Rubin)
Newsgroups: comp.arch
Subject: Re: RISC v. CISC --more misconceptions
Summary: This is so slow as to be useless.
Message-ID: <1005@l.cc.purdue.edu>
Date: 3 Nov 88 12:30:50 GMT
References: <156@gloom.UUCP> <18931@apple.Apple.COM> <40@sopwith.UUCP> <7575@aw.sei.cmu.edu>
Organization: Purdue University Statistics Department
Lines: 39

In article <7575@aw.sei.cmu.edu>, firth@sei.cmu.edu (Robert Firth) writes:
> In article <998@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
> >There are many other operations which are cheap in hardware and expensive in
> >software...I will toss in a few for starters.
> 
> >	Find the distance to the next one in a bit stream FAST.  It would
> >be good to have an exception handler if one is not found.  I am considering
> >algorithms not worth implementing if the operation is slow.
> 
> This is implemented in hardware on the MC68020 by the 'BFFFO' instruction.
> It takes 18 cycles to find the first bit in a 32-bit register, and upto
> 32 cycles to find the first bit in a 32-bit bitfield in memory.
> 
> I find it hard to believe that a software algorithm on a RISC machine
> would do worse.  In the register case, for instance, a simple binary
> chop takes at most 5 steps, and each step is just (and-immediate-with-mask;
> branch if result zero).  One can get more cunning, but that gives you
> the answer in about 12 cycles.

For the purposes I had in mind, this is painful.  I am thinking of what
I call "infinite precision" methods of generating random variables (A
technical report is available on request.); these procedures have no
errors due to accuracy of computer operations.  For generating exponential
random variables, there are several methods which use ~5 BITS on the 
average, and then only use Boolean operations to generate the final result.
Finding the next 1 counts as 2 bits.  However, for such a procedure to be
practical, it must compete with procedures which use about 10 operations,
for one the slowest being two table lookups and a floating-point test.  

These competing procedures are obviously more complex computationally.
However, the architecture makes them much cheaper.  Also, since random
numbers of bits are used, one has to worry about where the current
bit pointer is and that the current word may become exhausted.  It is 
impossible to modify any good algorithm for generating non-uniform random
numbers to avoid uncertainties of this nature.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)