Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!wuarchive!rex!uflorida!gatech!purdue!haven!ncifcrf!lhc!adm!smoke!gwyn
From: gwyn@smoke.brl.mil (Doug Gwyn)
Newsgroups: comp.sys.apple2
Subject: Re: CRCs
Message-ID: <15915@smoke.brl.mil>
Date: 20 Apr 91 22:47:39 GMT
References: <13688@ucrmath.ucr.edu>
Organization: U.S. Army Ballistic Research Laboratory, APG, MD.
Lines: 101

In article <13688@ucrmath.ucr.edu> rhyde@ucrmath.ucr.edu (randy hyde) writes:
>Assembly language typically works out a little better than twice as fast.

But the comparison was not really fair, since the assembly-language
example implementation did not support the same linkage as the C code.
For a function that performs so little work, function linkage overhead
is an important factor.

Also, the example C code was not optimal for the given algorithm.
By making a few minor tweaks to it I reduced the C example run time
by a factor of 26% on the host on which I read net news.  Profiling
the code showed that approximately as much time was being spent in
the main program's loop overhead as in the crc function itself.

For the revised crc function C source code:

int crc(data, curcrc)
	int data;
	register unsigned int curcrc;
{
	return (curcrc << 8) ^ crctab[(curcrc >> 8) ^ data];
}

a fairly ordinary old-technology C compiler on a Gould PowerNode
produced the following machine code:

_crc:	enter	8w,0x5		-- function linkage
	movw	9w+8w[b2],r1	-- pick up "curcrc" argument
	movw	r1,r0		-- make a second copy of it
	lslw	#8,r0		-- shift one copy left (accumulator)
	lsrw	#8,r1		-- shift the other copy right
	xorw	8w+8w[b2],r1	-- XOR shifted copy with "data" argument
	lslw	#2,r1		-- shift for indexing crctab[]
	xorw	_crctab[r1],r0	-- XOR table entry with accumulator
	retn			-- function linkage

and a similar compiler on a DEC VAX produced the following machine code:

_crc:	.word	0x800		; function linkage
	movl	8(ap),r11	; pick up "curcrc" argument
	extzv	$8,$24,r11,r0	; shift one copy left (accumulator)
	xorl2	4(ap),r0	; XOR shifted copy with "data" argument
	ashl	$8,r11,r1	; shift the other copy right
	xorl3	r1,_crctab[r0],r0 ; XOR table entry with accumulator
	ret			; function linkage

Note that there isn't very much that one could do even by hand-coded
assembly language to perform this computation much more efficiently.
(Except that the VAX actually has microcode support for a CRC
instruction!  That is an example of a situation in which use of
assembly language could be justified, since it is somewhat rare
for a C compiler to exploit such specialized instructions.  Another
possible example would be in a bottleneck loop when testing the
"carry bit" would be much more efficient than trying to do without
it; originally I thought that was going to be the justification for
coding the CRC algorithm in assembler.)

>I claim the ASM code above isn't any more difficult to understand than the
>C code (assuming, of course, that you know the algorithm that the code
>is attempting to implement).

For such a small function, the conceptual burden is indeed manageable.
For a larger example involving linked data structures, etc. I would
prefer to deal with C code over assembly code.

>From a development point of view, it would take far more time to dream
>up, implement, and verify the nibble-based C code that Doug posted than
>it would take to implement the unrolled version of the original CRC
>algorithm in assembly.

But that's comparing apples and oranges.  For the same algorithm with
the same constraints (particularly function linkage), I don't think
you've shown that assembly language code is appreciably faster than C
code, taking into account my earlier observations.

>I offer this as proof that assembly is often the better choice for a
>project.  Sure, you can find better algorithms in C which run faster than
>bad algorithms in ASM, but the end result is that you spend *far* more
>time designing, testing, and debugging the C code.  And if you can find
>a better algorithm that works in C, that same algorithm will work in
>assembly as well.

The C code, if you do it right, is immediately usable on a variety of
machines, and development cost of the algorithm is independent of the
implementation language.  Implementation of the algorithm in C is no
harder than in assembly language, and for complex algorithms it is
very nice to be able to rely on the compiler for help with the
bookkeeping.  Why do you think high-level languages were invented?

>Keep in mind, that the CRC algorithm is intrinsically O(n).  You're not
>going to do any better than this because it takes n operations to read
>the data.  Therefore, the only thing we can attack is the constant
>in the complexity of this algorithm.  ASM is really good at this.

And if you look at the above C compiler output, you'll see that C is
also really good at this, at least on decent machine architectures.

As I have previously remarked, there are legitimate uses for assembler,
but since C is about as fast, infinitely more portable, and easier to
use for large projects, I recommend C over assembler in the majority of
situations.