Path: utzoo!attcan!uunet!lll-winken!ncis!helios.ee.lbl.gov!nosc!ucsd!ucbvax!decwrl!labrea!polya!pratt From: pratt@polya.Stanford.EDU (Vaughan R. Pratt) Newsgroups: comp.lang.misc Subject: Re: Clever programming tricks wanted Summary: Reference to a 1976 paper containing many such tricks (long) Message-ID: <6210@polya.Stanford.EDU> Date: 19 Jan 89 18:47:24 GMT References: <4061@hubcap.UUCP> Reply-To: pratt@cs.Stanford.EDU (Vaughan R. Pratt) Organization: Stanford University Lines: 116 Followup-To: In article <4061@hubcap.UUCP> mcvax!etive.ed.ac.uk!gvw@uunet.UU.NET (MOBY) writes: >I'm looking for "clever" programming tricks. For example, >consider the following problem: > > Given a computer word W in which one or more bits are set, > ..... > >Greg Wilson >-- >Edinburgh Concurrent Supercomputer Project Many programming tricks for bit-vector machines can be found in the following article. Pratt, V.R., and L.J. Stockmeyer, "A Characterization of the Power of Vector Machines", JCSS, 12, 2, 1976, 198-221. (A preliminary version appeared in STOC-74.) The machine model is a bit-vector machine with the following register-to-register instructions. (This is freely paraphrased from the original to make it look more like C.) Ri = Rj << Rk, Ri = Rj >> Rk; Rk in 2's complement Ri op= Rj; for any of the 16 binary Boolean ops Ri = 0; if (Ri == 0) goto L; Any instruction may be labeled Note absence of arithmetic operations, see comments on this below. The machine in the paper had infinite-to-the-left registers, but we should have made the point more clearly that our results applied equally well to finite registers provided one puts up with some of the algorithms working only for inputs of size smaller than a word. Thus although n-bit numbers can be added in n-bit registers, their multiplication requires n^2-bit registers. These limitations are reflected in the second column in the table below. We also considered random access memory (accessed by indirect reference *Ri, with Ri interpreted as unsigned binary) and nondeterminism. However none of the following algorithms require either of these capabilities, and can be implemented using only a handful of registers. Here are some of the functions we computed, along with the length of register (in bits) needed, and the time needed. All logs to base 2. Constant factors omitted (but all reasonably small, i.e. nothing in the 100's). Data structures such as lists of numbers and matrices are stored in single registers, e.g. a 4x4 matrix of 4-bit numbers would be stored in a single 64-bit register. The expectation was that these algorithms would be at their most useful in a machine with very long words. To keep things simple I'll assume that all numbers are n bits, all lists are of length n and consist of n-bit numbers (so require registers with n^2 bits), and all matrices are nxn and consist of n-bit numbers (unless referred to as Boolean matrices), so require registers with n^3 bits (n^2 for Boolean matrices). Algorithms may require longer registers than needed to hold the inputs, e.g. matrix multiplication needs an additional factor of n. Operation Length of reg. Time Count 1's n log n Ri = Rj+Rk n log n Add n numbers n^2 log n Ri = Rj*Rk n^2 log n Boolean matrix transposition n^2 log n Boolean matrix multiplication n^3 log n Boolean matrix transitive closure n^3 log^2 n Integer matrix addition n^3 log n Integer matrix multiplication n^4 log n Simulation of nondeterministic space S Turing machine 2^2S S^2 Various other programming tricks appear in the paper as subroutines for the above. A particularly useful trick is the construction of masks Mi having a 1 at bit j just when j in binary has a 1 at bit i; we showed how to get the low-order 2^i bits of Mi in i steps, and then to construct each of M(i-1), M(i-2), ..., M0 in constant time. We also showed that a deterministic space T^2 Turing machine could simulate a nondeterministic time T vector machine, with or without random access memory, under the restriction that registers containing shift distances could themselves only be shifted by constant distances (to prevent the sort of blowup you get when you iterate Ri = Ri << Ri). (This restriction was removed several years later by Janos Simon using an ingenious compact coding of the enormous numbers that this iteration generated.) For polynomial space (PSPACE) on Turing machines, nondeterminism does not help. That is, PSPACE = NSPACE. Since the above simulations are to within a polynomial (namely a quadratic), it follows that, on bit vector machines, P = NP. This paper was the first of a series showing this relationship between serial and parallel computers for a variety of parallel architectures, leading to a general thesis that time on parallel computers behaved like space on serial computers, to within a square and independently of determinism or nondeterminism. It is wide open whether anything better than quadratic is possible in either direction of the simulation. Since addition takes time log n, many programs would run faster if the instruction set were expanded to include an add instruction. However the examples above of adddition of n numbers, multiplication of two numbers, and multiplication of nxn arithmetic matrices, show that this overhead of log n can often be absorbed into the other log n's, so the advantage of addition as a primitive is not as great as it might have seemed. We did not track down the the history of the log n solution to the problem of counting bits. Peter Wegner informed us that he had been party to a 1960 paper describing this solution, though I don't have a reference. Vaughan Pratt -- _______________________________________________________ Vaughan Pratt pratt@polya.stanford.edu 415-494-2545 =======================================================