Newsgroups: comp.arch Path: utzoo!utgpu!news-server.csri.toronto.edu!torsqnt!hybrid!scifi!bywater!uunet!kithrup!sef From: sef@kithrup.COM (Sean Eric Fagan) Subject: Re: new instructions Organization: Kithrup Enterprises, Ltd. Date: Thu, 23 May 1991 19:25:57 GMT Message-ID: <1991May23.192557.7558@kithrup.COM> References: <1991May22.001620.751@craycos.com> <1991May23.084258.5062@kithrup.COM> <24216@lanl.gov> In article <24216@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes: >It would amaze me to find any machine (on which the test could be done) >where a table lookup came within an order of magnitude of a hardware >instruction on these functions. How about a Cyber? A Cyber, without the pop-count hardware, takes something like 60 cycles to do a popcount. And the Cray has lousy memory-access times, and isn't a byte-addressable machine. How long would char *byte = (char *)&word; pop_count = table[byte[0]] + table[byte[1]] + table[byte[2]] + table[byte[3]]; take on a machine with somewhat better memory accesses? Say, an R6000, or even a Sparc? And don't forget that, for serial code, the R6000 is faster than the Cray. So that doesn't quite count as a "slow machine," does it? So here is a way of doing pop-count, quite quickly (it's possible for a compiler to put the byte[x] into registers and not have to access memory, the first reference to table could put quite a bit of table into a cache, and if you have pipelined loads, it *does* go *very* quickly), that doesn't require any special instructions. And will work the same way, if not faster, on later versions of the processor. This is not true with instructions that don't get a lot of use: witness the 68040 and transcendental instructions. -- Sean Eric Fagan | "I made the universe, but please don't blame me for it; sef@kithrup.COM | I had a bellyache at the time." -----------------+ -- The Turtle (Stephen King, _It_) Any opinions expressed are my own, and generally unpopular with others.