Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!zaphod.mps.ohio-state.edu!rpi!uupsi!sunic!dkuug!freja.diku.dk!njk
From: njk@diku.dk (Niels J|rgen Kruse)
Newsgroups: comp.lang.misc
Subject: Re: Compiler Costs
Message-ID: <1990Jul18.082518.18235@diku.dk>
Date: 18 Jul 90 08:25:18 GMT
References: <1319@fs1.ee.ubc.ca> <56704@lanl.gov> <=AP4-:8@ggpc2.ferranti.com> <SRA.90Jul17105145@alonzo.ecs.soton.ac.uk>
Organization: Department Of Computer Science, University Of Copenhagen
Lines: 31

I wrote:
> > Better yet, xor the bytes together and then use the fine lookup table.
> > Parity is just the xor of all the bits in any old order, remember.

sra@ecs.soton.ac.uk (Stephen Adams) writes:
>Indeed, you can use this property and `fold' the byte in
>half using shift and xor.  [and so on]
>On a RISC it should be about as fast as a lookup table as
>all the work is done in registers.

I really doubt that is true in general.

>       x = datum;
>       x^=(x>>4);
>       x^=(x>>2);
>       x^=(x>>1);
>       parity_bit = x&1;

I count 7 operations here, which on a typical RISC require 7 clocks.

Typical cost of a load from cache is 2 clocks.  At worst, a
table lookup require 2 loads, one of which is loading the
address of the table from the constant pool.  We can expect the
compiler to move the latter out of surrounding loops.  It is not
unreasonable to expect an often used and small lookup table to
be in cache.

I suspect you are thinking of a RISC like the Archimedes, that
doesn't have cache and has an unusual instruction set, which
allow shifting operands in the same instruction as an other
operation.  (I am not too familiar with the ARM).
-- 
Niels J|rgen Kruse 	DIKU Graduate 	njk@diku.dk