Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!hsdndev!cmcl2!lanl!cochiti.lanl.gov!jlg From: jlg@cochiti.lanl.gov (Jim Giles) Newsgroups: comp.arch Subject: Re: new instructions Message-ID: <24263@lanl.gov> Date: 22 May 91 21:11:59 GMT References: <1991May22.001620.751@craycos.com> <1991May23.084258.5062@kithrup.COM> <24216@lanl.gov> <1991May23.192557.7558@kithrup.COM> Sender: news@lanl.gov Organization: Los Alamos National Laboratory Lines: 26 In article <1991May23.192557.7558@kithrup.COM>, sef@kithrup.COM (Sean Eric Fagan) writes: |> [...] How long would |> |> char *byte = (char *)&word; |> pop_count = table[byte[0]] + table[byte[1]] + table[byte[2]] + |> table[byte[3]]; |> |> take on a machine with somewhat better memory accesses? Say, an R6000, or |> even a Sparc? First, this code only does a pop count on a 32 bit object, not 64. Second, I mentioned this case on my last posting: I would bet that an implementation of pop count as a hardware instruction on either of these machines (using the technology they were built with) would be _ONE_ clock long. The above sequence takes in excess of 10. |> And don't forget that, for serial code, the R6000 is faster than the Cray. |> So that doesn't quite count as a "slow machine," does it? What is the relevance of a comparison of the R6000 to the Cray in the context of this discussion? The issue is whether pop count can be performed as quickly in software as in hardware. This is an issue to be decided on the basis of each machine individually. The R6000 would be even _faster_ if pop count were a hardware instruction!! J. Giles