Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!zaphod.mps.ohio-state.edu!think.com!paperboy!hsdndev!cmcl2!adm!smoke!gwyn
From: gwyn@smoke.brl.mil (Doug Gwyn)
Newsgroups: comp.sys.apple2
Subject: Re: HLLs vs. Assembly
Message-ID: <15838@smoke.brl.mil>
Date: 16 Apr 91 00:31:49 GMT
References: <13345@ucrmath.ucr.edu> <1035@stewart.UUCP> <13494@ucrmath.ucr.edu>
Organization: U.S. Army Ballistic Research Laboratory, APG, MD.
Lines: 130

In article <13494@ucrmath.ucr.edu> rhyde@gibson.ucr.edu (randy hyde) writes:
>>> I disagree with this (bit pushing is easier in assembly than in C)
>Okay, here's a challenge, code a CRC-16 algorithm in C.  I can promise you
>that I can do it in few lines of assembly language (on *any* machine) and the
>result will be easier to understand to anyone who has a basic understanding of
>the instruction set.

Here is an extract from actual code (being executed for EVERY character
being sent to or from the terminal I'm typing this on), not written by me
(or it would have been more readable).  It is a fair comparison, because
it uses the same algorithm with the same interface constraints (e.g., must
be callable from C).

C version:

	typedef unsigned char	uchar;
	typedef unsigned short	ushort;

	#define	lobyte(X)	((X)&0xff)
	#define	hibyte(X)	(((X)>>8)&0xff)

	static ushort	crc16t_32[2][16]	=
	{
		0, 0140301, 0140601, 0500,
		0141401, 01700, 01200, 0141101,
		0143001, 03300, 003600, 0143501,
		02400, 0142701, 0142201, 02100,
		0, 0146001, 0154001, 012000,
		0170001, 036000, 024000, 0162001,
		0120001, 066000, 074000, 0132001,
		050000, 0116001, 0104001, 042000
	};

	int
	crc(buffer, nbytes)
		register uchar *buffer;
		int		nbytes;
	{
		register ushort	tcrc = 0;
		register int	temp;
		register int	i;

		if ( (i = nbytes) > 0 )
		do
		{
			temp = tcrc ^ *buffer++;
			tcrc = crc16t_32[0][temp & 017]
				 ^ crc16t_32[1][(temp>>4) & 017]
				 ^ (tcrc>>8);
		}
		while
			( --i > 0 );

		if ( lobyte(tcrc) != *buffer )
			i++;
		*buffer++ = lobyte(tcrc);

		if ( hibyte(tcrc) != *buffer )
			i++;
		*buffer++ = hibyte(tcrc);

		return i;
	}

MC68000 assembler version:

		data
	crc16t_3:
		word	0,0140301,0140601,0500
		word	0141401,01700,01200,0141101
		word	0143001,03300,003600,0143501
		word	02400,0142701,0142201,02100
		word	0,0146001,0154001,012000,0170001
		word	036000,024000,0162001,0120001
		word	066000,074000,0132001,050000
		word	0116001,0104001,042000
		text
		global	crc
	crc:
		link	%fp,&crcF
		movm.l	&crcM,crcS(%fp)
		mov.l	8(%fp),%a2
		mov.l	&0,%d2
		mov.w	12(%fp),%d4
		ble	crc%140
	crc%170:
		mov.b	(%a2)+,%d3
		eor.b	%d2,%d3
		mov.l	&15,%d0
		and.b	%d3,%d0
		add.l	%d0,%d0
		mov.l	&crc16t_3,%a1
		mov.w	0(%a1,%d0.l),%d0
		lsr.b	&3,%d3
		and.w	&30,%d3
		mov.l	&crc16t_3+32,%a0
		mov.w	0(%a0,%d3.w),%d1
		eor.w	%d0,%d1
		lsr.w	&8,%d2
		eor.w	%d1,%d2
		sub.w	&1,%d4
		bgt	crc%170
	crc%140:
		cmp.b	%d2,(%a2)
		beq	crc%180
		add.w	&1,%d4
	crc%180:
		mov.b	%d2,(%a2)+
		lsr.w	&8,%d2
		cmp.b	%d2,(%a2)
		beq	crc%190
		add.w	&1,%d4
	crc%190:
		mov.b	%d2,(%a2)+
		mov.w	%d4,%d0
		movm.l	crcS(%fp),&crcM
		unlk	%fp
		rts
		set	crcS,-16
		set	crcF,-22
		set	crcM,02034

Frankly, I don't think either version is very intuitive, but both
are intended to execute quickly.  I would have to wonder about anybody
who would claim that there is anything inherently more readable about
the MC68000 version.

And no fair rewriting it to use a different algorithm or to fit
different constraints!  I could certainly substantially improve the
readability of the C version too under those circumstances.