Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!umcp-cs!chris
From: chris@umcp-cs.UUCP (Chris Torek)
Newsgroups: net.lang.c
Subject: Re: Re: structure alignment question
Message-ID: <3527@umcp-cs.UUCP>
Date: Sun, 21-Sep-86 16:11:24 EDT
Article-I.D.: umcp-cs.3527
Posted: Sun Sep 21 16:11:24 1986
Date-Received: Sun, 21-Sep-86 23:38:54 EDT
References: <101@hcx1.UUCP> <7363@sun.uucp> <696@mips.UUCP> <7447@sun.uucp> <1705@mcc-pp.UUCP>
Reply-To: chris@umcp-cs.UUCP (Chris Torek)
Organization: University of Maryland, Dept. of Computer Sci.
Lines: 74

In article <1705@mcc-pp.UUCP> tiemann@mcc-pp.UUCP (Michael Tiemann) writes:
>... The last 68000 compiler I used aligned strings on WORD boundaries.
>This would cost one byte per string, half the time. But there was
>a big speed payoff: I could do word operations in my strnlen,
>strncmp, strncpy, and whatever other string processing functions
>I happened to write. ... all this "fast" code actually runs slower
>than a "dumb" byte-copy model [on a Sun-3], because the 68020 faults
>itself to death reading in 32-bit words on odd boundaries, and
>doesn't run at all on a Sun-2 because the 68010 can read odd words.

(Does the 68020 really fault?  I thought it just did two bus accesses.)

It is not difficult to do copies in word mode iff the strings
are aligned:

	| Sun mnenonics

	| /*LINTLIBRARY*/
	| strcpy(to, from) char *to, *from; { *to = *from; return (to); }
	| /*UNTESTED!*/
		ENTRY(strcpy)
	TO	=	a0		| I think this works
	FROM	=	a1
		movl	sp@(4),TO	| to
		movl	sp@(8),FROM	| from
	| I forget if this is legal.  If not, copy to d0 first.
		btst	#0,TO		| test for odd destination
		bnes	odd0		| handle odd dst, unknown src
		btst	#0,FROM		| test for odd source
		bnes	hardway		| handle even dst, odd src

	| both addresses are even; do a fast strcpy
	fastcopy:
		movw	FROM@+,d0	| grab entire word
		movw	d0,d1		| need to test high byte first
		lsrw	#8,d1		| throw out low byte
		beqs	fastend		| if high byte zero, go terminate dst
		movw	d0,TO@+		| copy entire word
		tstb	d0		| and see if we are now done
		bnes	fastcopy	| do more if not
		movl	sp@(4),d0	| set return value
		rts			| and return
	fastend:
		movql	#0,d0
		movb	d0,TO@		| terminate destination string
		movl	sp@(4),d0	| set return value
		rts			| and return

	odd0:
		btst	#0,FROM		| test for odd source
		beqs	hardway		| handle odd dst, even src
		movb	FROM@+,TO@+	| copy one byte to make even
		bnes	fastcopy	| and do rest with fast copy
		movl	sp@(4),d0	| set return value
		rts			| and return

	| one address is even, the other odd, so do it a byte at a time.
	hardway:
		movl	TO,d0		| set return value
	hardloop:
		movb	FROM@+,TO@+	| copy ...
		bnes	hardloop	| until we copy a null
		rts			| return

I wonder, though, if this is truly faster.  Should not a movb/bnes
pair run in loop mode?  (Perhaps not; `dbcc' loops do, though, and
one could use a dbra surrounded by a bit of extra logic.)  Machine
dependent `fast' code is often CPU dependent as well, and one must
be prepared to modify marked inner loops when moving among implem-
entations of one architecture.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu