Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!think.com!spool.mu.edu!uunet!convex!usenet From: tchrist@convex.COM (Tom Christiansen) Newsgroups: comp.unix.programmer Subject: Re: looking for sysv sum(1) algorithm Message-ID: <1991Jun03.055711.8465@convex.com> Date: 3 Jun 91 05:57:11 GMT Article-I.D.: convex.1991Jun03.055711.8465 References: Sender: usenet@convex.com (news access account) Reply-To: tchrist@convex.COM (Tom Christiansen) Distribution: comp Organization: CONVEX Software Development, Richardson, TX Lines: 65 Nntp-Posting-Host: pixel.convex.com From the keyboard of herbie@dec07.cs.monash.edu.au (Andrew Herbert): :Hello all. : :Can anyone tell me where I can find a description of the SysV sum(1) :checksum algorithm, or some code which implements it? I am using :SysVR4, but couldn't find anything to do this in the standard libraries. I think I can tell you. I have no SysVr4 source code, so had to reverse engineer what's going on by taking a working emulation of a SysV sum(1) program written in perl (after confirming it really does give the same output as sum(1)) and then looking to see what perl's doing inside. But I get the same results, so something here must be right. To start with, this perl code seems to emulate the sum(1) command fairly well, as found on a SysV system I have lying around here: while (<>) { $checksum += unpack("%31C*", $_); $checksum %= 65535; $bytes += length; if (eof) { printf "%d %d %s\n", $checksum, ($bytes+511/512, $ARGV; $checksum = $bytes = 0; } } Speed freaks might take note that the following rendition actually faster than the C code! Big buffers pay off. while ($ARGV = shift) { warn("can't open $ARGV: $!"), next unless open ARGV; while (read(ARGV,$_,16 * 512)) { $checksum += unpack("%31C*", $_); $checksum %= 65535; $bytes += length; } printf "%d %d %s\n", $checksum, ($bytes+511)/512, $ARGV; $checksum = $bytes = 0; } Of course, this doesn't really help you to know what's going on until you know what unpack() is doing. Looking in perl/src/doio.c, in the function do_unpack(), you find that what's happening is basically the following (loosely transcribed): checksum = 31; /* from the %31C in unpack */ sum = 0; unsigned char *sp = string; /* string is a (char *) pointing to $_ while (*sp) sum += *sp++; sum &= (1 << checksum) - 1; return sum; That's what happening for each record. If you look at the above perl code, we add in this sum to our running $checksum variable each time through the perl while loop, and then modulo it by 65535 each time (not 65536) to keep it small. Then when each file runs out, we output this value, the number of 512-byte blocks, and the file's name. Hope this helps. --tom -- Tom Christiansen tchrist@convex.com convex!tchrist "Perl is to sed as C is to assembly language." -me