Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!hsdndev!cmcl2!lanl!cochiti.lanl.gov!jlg
From: jlg@cochiti.lanl.gov (Jim Giles)
Newsgroups: comp.arch
Subject: Re: new instructions
Message-ID: <24263@lanl.gov>
Date: 22 May 91 21:11:59 GMT
References: <1991May22.001620.751@craycos.com> <1991May23.084258.5062@kithrup.COM> <24216@lanl.gov> <1991May23.192557.7558@kithrup.COM>
Sender: news@lanl.gov
Organization: Los Alamos National Laboratory
Lines: 26

In article <1991May23.192557.7558@kithrup.COM>, sef@kithrup.COM (Sean Eric Fagan) writes:
|> [...]                                                  How long would
|> 
|> 	char *byte = (char *)&word;
|> 	pop_count = table[byte[0]] + table[byte[1]] + table[byte[2]] +
|> 		table[byte[3]];
|> 
|> take on a machine with somewhat better memory accesses?  Say, an R6000, or
|> even a Sparc?

First, this code only does a pop count on a 32 bit object, not 64.  Second,
I mentioned this case on my last posting: I would bet that an implementation
of pop count as a hardware instruction on either of these machines (using the
technology they were built with) would be _ONE_ clock long.  The above
sequence takes in excess of 10.

|> And don't forget that, for serial code, the R6000 is faster than the Cray.
|> So that doesn't quite count as a "slow machine," does it?

What is the relevance of a comparison of the R6000 to the Cray in the
context of this discussion?  The issue is whether pop count can be
performed as quickly in software as in hardware.  This is an issue to 
be decided on the basis of each machine individually.  The R6000 would
be even _faster_ if pop count were a hardware instruction!!

J. Giles