Path: utzoo!attcan!uunet!lll-winken!ncis!helios.ee.lbl.gov!nosc!ucsd!ucbvax!decwrl!labrea!polya!pratt
From: pratt@polya.Stanford.EDU (Vaughan R. Pratt)
Newsgroups: comp.lang.misc
Subject: Re: Clever programming tricks wanted
Summary: Reference to a 1976 paper containing many such tricks (long)
Message-ID: <6210@polya.Stanford.EDU>
Date: 19 Jan 89 18:47:24 GMT
References: <4061@hubcap.UUCP>
Reply-To: pratt@cs.Stanford.EDU (Vaughan R. Pratt)
Organization: Stanford University
Lines: 116
Followup-To:

In article <4061@hubcap.UUCP> mcvax!etive.ed.ac.uk!gvw@uunet.UU.NET (MOBY) writes:
>I'm looking for "clever" programming tricks.  For example,
>consider the following problem:
>
>    Given a computer word W in which one or more bits are set,
>  .....
>
>Greg Wilson
>-- 
>Edinburgh Concurrent Supercomputer Project

Many programming tricks for bit-vector machines can be found in the
following article.

Pratt, V.R., and L.J. Stockmeyer, "A Characterization of the Power of
Vector Machines", JCSS, 12, 2, 1976, 198-221.  (A preliminary version
appeared in STOC-74.)

The machine model is a bit-vector machine with the following
register-to-register instructions.  (This is freely paraphrased from the
original to make it look more like C.)

	Ri = Rj << Rk, Ri = Rj >> Rk;		Rk in 2's complement
	Ri op= Rj;		for any of the 16 binary Boolean ops
	Ri = 0;
	if (Ri == 0) goto L;	Any instruction may be labeled

Note absence of arithmetic operations, see comments on this below.

The machine in the paper had infinite-to-the-left registers, but we
should have made the point more clearly that our results applied
equally well to finite registers provided one puts up with some of the
algorithms working only for inputs of size smaller than a word.  Thus
although n-bit numbers can be added in n-bit registers, their
multiplication requires n^2-bit registers.  These limitations are reflected in
the second column in the table below.

We also considered random access memory (accessed by indirect reference
*Ri, with Ri interpreted as unsigned binary) and nondeterminism.
However none of the following algorithms require either of these capabilities,
and can be implemented using only a handful of registers.

Here are some of the functions we computed, along with the length of register
(in bits) needed, and the time needed.  All logs to base 2.  Constant factors
omitted (but all reasonably small, i.e. nothing in the 100's).

Data structures such as lists of numbers and matrices are stored in
single registers, e.g. a 4x4 matrix of 4-bit numbers would be stored in
a single 64-bit register.  The expectation was that these algorithms
would be at their most useful in a machine with very long words.

To keep things simple I'll assume that all numbers are n bits, all
lists are of length n and consist of n-bit numbers (so require
registers with n^2 bits), and all matrices are nxn and consist of n-bit
numbers (unless referred to as Boolean matrices), so require registers
with n^3 bits (n^2 for Boolean matrices).  Algorithms may require
longer registers than needed to hold the inputs, e.g. matrix
multiplication needs an additional factor of n.

Operation			Length of reg.  Time

Count 1's				n	log n
Ri = Rj+Rk				n	log n
Add n numbers				n^2	log n
Ri = Rj*Rk				n^2	log n
Boolean matrix transposition		n^2	log n
Boolean matrix multiplication		n^3	log n
Boolean matrix transitive closure	n^3	log^2 n
Integer matrix addition			n^3	log n
Integer matrix multiplication		n^4	log n
Simulation of nondeterministic
	    space S Turing machine	2^2S	S^2

Various other programming tricks appear in the paper as subroutines for the
above.  A particularly useful trick is the construction of masks Mi
having a 1 at bit j just when j in binary has a 1 at bit i; we showed how
to get the low-order 2^i bits of Mi in i steps, and then to construct
each of M(i-1), M(i-2), ..., M0 in constant time.

We also showed that a deterministic space T^2 Turing machine could
simulate a nondeterministic time T vector machine, with or without
random access memory, under the restriction that registers containing
shift distances could themselves only be shifted by constant distances
(to prevent the sort of blowup you get when you iterate
Ri = Ri << Ri).  (This restriction was removed several years later by
Janos Simon using an ingenious compact coding of the enormous numbers
that this iteration generated.)

For polynomial space (PSPACE) on Turing machines, nondeterminism does
not help.  That is, PSPACE = NSPACE.  Since the above simulations are
to within a polynomial (namely a quadratic), it follows that, on bit
vector machines, P = NP.

This paper was the first of a series showing this relationship between serial
and parallel computers for a variety of parallel architectures, leading to
a general thesis that time on parallel computers behaved like space on serial
computers, to within a square and independently of determinism or
nondeterminism.  It is wide open whether anything better than quadratic
is possible in either direction of the simulation.

Since addition takes time log n, many programs would run faster if the
instruction set were expanded to include an add instruction.  However
the examples above of adddition of n numbers, multiplication of two
numbers, and multiplication of nxn arithmetic matrices, show that this
overhead of log n can often be absorbed into the other log n's, so the
advantage of addition as a primitive is not as great as it might have
seemed.

We did not track down the the history of the log n solution to the problem
of counting bits.  Peter Wegner informed us that he had been party to a 1960
paper describing this solution, though I don't have a reference.
	Vaughan Pratt
-- 
		_______________________________________________________
		Vaughan Pratt   pratt@polya.stanford.edu   415-494-2545
		=======================================================