Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!zaphod.mps.ohio-state.edu!rpi!uupsi!sunic!dkuug!freja.diku.dk!njk From: njk@diku.dk (Niels J|rgen Kruse) Newsgroups: comp.lang.misc Subject: Re: Compiler Costs Message-ID: <1990Jul18.082518.18235@diku.dk> Date: 18 Jul 90 08:25:18 GMT References: <1319@fs1.ee.ubc.ca> <56704@lanl.gov> <=AP4-:8@ggpc2.ferranti.com> Organization: Department Of Computer Science, University Of Copenhagen Lines: 31 I wrote: > > Better yet, xor the bytes together and then use the fine lookup table. > > Parity is just the xor of all the bits in any old order, remember. sra@ecs.soton.ac.uk (Stephen Adams) writes: >Indeed, you can use this property and `fold' the byte in >half using shift and xor. [and so on] >On a RISC it should be about as fast as a lookup table as >all the work is done in registers. I really doubt that is true in general. > x = datum; > x^=(x>>4); > x^=(x>>2); > x^=(x>>1); > parity_bit = x&1; I count 7 operations here, which on a typical RISC require 7 clocks. Typical cost of a load from cache is 2 clocks. At worst, a table lookup require 2 loads, one of which is loading the address of the table from the constant pool. We can expect the compiler to move the latter out of surrounding loops. It is not unreasonable to expect an often used and small lookup table to be in cache. I suspect you are thinking of a RISC like the Archimedes, that doesn't have cache and has an unusual instruction set, which allow shifting operands in the same instruction as an other operation. (I am not too familiar with the ARM). -- Niels J|rgen Kruse DIKU Graduate njk@diku.dk