Xref: utzoo comp.arch:17148 comp.lang.misc:5167 Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!tut.cis.ohio-state.edu!purdue!mentor.cc.purdue.edu!l.cc.purdue.edu!cik From: cik@l.cc.purdue.edu (Herman Rubin) Newsgroups: comp.arch,comp.lang.misc Subject: Re: An example of a program (long) Summary: A description of what the program is doing Keywords: coding efficiency, use of peculiar instructions. Message-ID: <2363@l.cc.purdue.edu> Date: 16 Jul 90 11:36:59 GMT References: <2362@l.cc.purdue.edu> Followup-To: comp.arch,comp.lang.misc Organization: Purdue University Statistics Department Lines: 57 One of the readers has complained that he was unable to figure out what my program was doing. Therefore, I am posting a description of the algorithm and how it was arrived at. In article <2362@l.cc.purdue.edu>, cik@l.cc.purdue.edu (Herman Rubin) writes: > > There have been request for me to post a program which points out some > of the problems with HLLs. This is written in "sort of" C. There may > be minor errors. This program is rather long, and is rather incomplete. For more details of the actual program, see the article. This is a fairly typical example of an "infinite precision" procedure. Only a few bits are examined, and the procedure is complete for its intended use except for the copying of random bits. The code produced was produced at one sitting. The algorithm was worked out the previous day, and only the w code had been done before. ----------- It is desired to produce a random variable which is a bit mask m with probability 2(1 - e^(-.5)) and an impossible result the rest of the time, with the property that a uniform random variable with the bits of the mask cleared will have the distribution of the fractional part of an exponential random variable with mean 2. This procedure to accomplish this is to first produce a random variable such that P(J = j) = e^(-.5) * 2^(1-j / j! Then the mask consists of 1's in those places forced 0 by a bit-by-bit process of minimizing j uniform random variables, or something with that distribution. With the remaining probability, an easily tested impossible result is returned. This is to be accomplished using few random bits. Ignoring the exponential factor, the measure for 1 is 1, for 2 is 2^(-2), for 3 and 4 together is 3*2^(-6, for 5 it is C*2^(-11), where C = 16/15. and an upper bound for 6 or more is C*2^(-14). The actual bit patterns are instructive. The additional bits coming from C, as well as the additional bits needed to separate 3 from 4, are useful in obtaining the mask. Also, the exponential is .5 + .125*(1 - 1/6*(1 - 1/8*(1 - 1/10* ...)))) So the procedure is to first get a random variable which has the desired measure with probability .5, and also with probability .125. In the .125 case, appropriate procedures are used to correct for the additional factor. The "6 or more" is further refined, with rejection possibilities. Then the appropriate output procedures are used to get a mask with the right properties. A geometric random variable with p=.5 costs 2 bits average, and a test on one bit costs 1 bit. The expect number of bits used is < 3. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!cik(UUCP)