Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!samsung!sol.ctr.columbia.edu!srcsip!msi.umn.edu!cs.umn.edu!quest!zeno!gene From: gene@zeno.mn.org (Gene H. Olson) Newsgroups: comp.lang.c Subject: Re: count of bits in a long Keywords: C Message-ID: <1990Oct12.154137.28117@zeno.mn.org> Date: 12 Oct 90 15:41:37 GMT References: <826@neccan.oz> Organization: Smartware Consulting Lines: 110 peter@neccan.oz (Peter Miller) writes: >/* > * A while back I posted a solution to the "count of bits set in a > * long" question, suggesting that HACKMEM 169 was a very fast way to do it. > * This program demonstrates the technique. > * > * The HACKMEM 169 method uses a modulo to sum the digits. > * A number of people have asked me "how can a division be faster than a loop?" > * This program demonstrates how. The modulo is a special case, > * and can be unwound into a tight loop if your machine does not > * have a fast divide opcode (although many modern machines have a 1 cycle > * divide opcode, except for we-really-believe-our-press-hype risc > * manufacturers). > * > * This program was run on a 386/20 under unix with the following results: > * Func sec/iter scaled > * nbits 0.00018310547 1.000 hackmem 169 > * nbits1 0.0006980896 3.812 hackmem 169 no % operator > * nbits2 0.0016822815 9.188 count lsb's loop > * nbits3 0.0037384033 20.417 > * nbits4 0.0034370422 18.771 > * nbits5 0.0037765503 20.625 > * nbits6 0.0035247803 19.250 > */ > ..... A program follows this. To this program I have added a bug fix (the function table was wrong) and another function (nbit7) which is MY FAVORITE WAY of counting bits. The patches are at the end of this posting.... Anyway, I got much better results on both the 386 (which has a multiply) and on a SPARC (which doesn't) than any of the functions in the original program. The results were: 386 SPARC ------------------------------ ------------------------------ nbits 0.00020980835 1.774 nbits 0.00096893311 15.875 nbits1 0.00038909912 3.290 nbits1 0.00012969971 2.125 nbits2 0.00075149536 6.355 nbits2 0.00031280518 5.125 nbits3 0.0015411377 13.032 nbits3 0.00067138672 11.000 nbits4 0.0016479492 13.935 nbits4 0.00065231323 10.688 nbits5 0.0015525818 13.129 nbits5 0.00067520142 11.062 nbits6 0.0021286011 18.000 nbits6 0.00086212158 14.125 nbits7 0.00011825562 1.000 nbits7 6.1035156e-05 1.000 The nbit7 technique is a simple routine which shifts and sums bits in place until all bits have summed in the lower 8 bits of the word. The implementation here requires 32 bit longs, but it is easily extensible to 64 bits with a mask change and the additional shift shown. Enjoy, ------------- Remainder of Posting is a Cdiff Patchfile ---------- *** orig.c Fri Oct 12 08:56:22 1990 --- gene.c Fri Oct 12 09:23:51 1990 *************** *** 215,220 **** --- 215,236 ---- return count; } + int + nbits7(n) + register unsigned long n ; + { + n = ((n >> 1) & 0x55555555) + (n & 0x55555555) ; + n = ((n >> 2) & 0x33333333) + (n & 0x33333333) ; + n = ((n >> 4) + n) & 0x0F0F0F0F ; + n = ((n >> 8) + n) ; + #if LONG64 + n = (n >> 16) + n ; + return ((n >> 32) + n) & 0xFF ; + #else + return ((n >> 16) + n) & 0xFF ; + #endif + } + /* * build a long random number. * The standard rand() returns a 15bit number. *************** *** 275,282 **** { "nbits2", nbits2 }, { "nbits3", nbits3 }, { "nbits4", nbits4 }, ! { "nbits5", nbits3 }, ! { "nbits6", nbits4 }, }; main() --- 291,299 ---- { "nbits2", nbits2 }, { "nbits3", nbits3 }, { "nbits4", nbits4 }, ! { "nbits5", nbits5 }, ! { "nbits6", nbits6 }, ! { "nbits7", nbits7 }, }; main() _________________________________________________________________________ __ / ) Gene H. Olson uunet!digibd!zeno!gene / __ _ __ _ Smartware Consulting (__/ _(/_//_(/_ Minneapolis, MN (612) 824-9108