Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!uunet!convex!usenet
From: tchrist@convex.COM (Tom Christiansen)
Newsgroups: comp.lang.perl
Subject: Re: Calculating XOR checksums? Fast splitting?
Message-ID: <1991Jun03.181303.15826@convex.com>
Date: 3 Jun 91 18:13:03 GMT
References: <GNB.91Jun3135656@leo.bby.oz.au>
Sender: usenet@convex.com (news access account)
Reply-To: tchrist@convex.COM (Tom Christiansen)
Organization: CONVEX Software Development, Richardson, TX
Lines: 64
Nntp-Posting-Host: pixel.convex.com

From the keyboard of gnb@bby.oz.au (Gregory N. Bond):
:Given a string, what is the fastest way to calculate the XOR of all
:bytes?  here is what I used:
:
:    $bcc = 0;
:    grep ($bcc ^= $_, unpack("C" x length($sc_data), $sc_data));
:
:But is there a batter way without having to construct and destroy the
:array? (I need this to be fairly quick...).

Well, you should use "C*" instead of what you have to save about 15%.
The real shame is that it has to be an XOR checksum; if you could
tolerate an additive one, then you could just use this:

    $bcc = unpack('%31C*', $sc_data);

and run it about 5% of the time that your current loop takes.  If
XOR checksums are that common, maybe you might prevail upon Larry
to add such a feature.  Maybe "^31C*" or some such.

:And a second point, what is likely to be faster to split a string into
:several subfields?  A regexp:
:	$str =~ /(.{32})(.{24})(.{8})/;
:	($a, $b, $c) = ($1, $2, $3);
:or unpack:
:	($a, $b, $c) = unpack("c32 c24 c8", $str);
:or substr:
:	$a = substr($str, 0, 32);
:	$b = substr($str, 32, 24);
:	$c = substr($str, 32 + 24, 8);


You could save about 11% if you did a direct assignment for the regexp:

    ($a, $b, $c) = $str =~ /(.{32})(.{24})(.{8})/;

By using unpack, you save an additional 25%.  Your unpack, by the way,
is wrong.  You need to be using an A not a C format there.

Perhaps counterintuitively, it is a wee bit faster in this
case to use the 3 substr()s over the unpack().  However, when 
you have 10 fields, it is faster to use unpack() than either
of the other methods, with substr()s taking ~7% longer and
regexp taking ~20% longer.


It's pretty easy to divine these yourself.  I basically did this
on all these cases to find what the differences were:

    $COUNT = 10000;
    ($u, $s) = times;
    for ($I = 0; $I < $COUNT; $I++) {
	# some operation
    }
    ($nu, $ns) = times;
    printf "%8.4fu %8.4fs\n", ($nu - $u), ($ns - $s);


--tom
--
Tom Christiansen		tchrist@convex.com	convex!tchrist
	    "Perl is to sed as C is to assembly language."  -me