Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!mit-eddie!uw-beaver!ubc-vision!alberta!calgary!radford
From: radford@calgary.UUCP
Newsgroups: comp.arch,comp.lang.c
Subject: Re: String Handling -- Incompetence of run-time libraries
Message-ID: <864@vaxb.calgary.UUCP>
Date: Thu, 2-Apr-87 14:34:01 EST
Article-I.D.: vaxb.864
Posted: Thu Apr  2 14:34:01 1987
Date-Received: Sun, 5-Apr-87 01:24:57 EST
References: <15292@amdcad.UUCP> <978@ames.UUCP> <15694@sun.uucp> <6071@mimsy.UUCP>
Organization: U. of Calgary, Calgary, Ab.
Lines: 34
Keywords: instruction set architectures, strcpy
Xref: utgpu comp.arch:766 comp.lang.c:1451
Summary: Unrolling will speed up strcpy

In article <6071@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes:

> The proper way to speed strcpy() on a MicroVAX-II is no doubt to
> use the following assembly code:
> 
> 	_strcpy:.globl	_strcpy
> 		.word	0		# save no registers
> 		movq	4(ap),r1	# get s1 and s2 into r1 and r2
> 		movl	r1,r0		# save s1
> 	1:	movb	(r2)+,(r1)+	# *s1++ = *s2++
> 		bneq	1b		# loop until a zero is moved
> 		ret			# return original s1 in r0
> 
> Note that this is remarkably similar to the compiler's output
> for the original code, modified to have the proper return value:

> All one can improve on the locc-poor MicroVAX-II is the register
> usage and the parameter grabbing.  (c2, at least from 32V to 4.3BSD,
> will never turn two `movl's into a `movq'.  Ah well.)

Replacing your loop with:

	1:	movb	(r2)+,(r1)+
		bequ	2f
		movb	(r2)+,(r1)+
		bneq	1b
	2:

will almost certainly speed things up (say 15%). I haven't actually
tried it, but I've tried entirely analogous cases. Loop unrolling can
produce speed-up even when the instruction count is unchanged, if
taken branches are replaced by untaken branches. 

    Radford Neal