Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!mit-eddie!uw-beaver!ubc-vision!alberta!calgary!radford From: radford@calgary.UUCP Newsgroups: comp.arch,comp.lang.c Subject: Re: String Handling -- Incompetence of run-time libraries Message-ID: <864@vaxb.calgary.UUCP> Date: Thu, 2-Apr-87 14:34:01 EST Article-I.D.: vaxb.864 Posted: Thu Apr 2 14:34:01 1987 Date-Received: Sun, 5-Apr-87 01:24:57 EST References: <15292@amdcad.UUCP> <978@ames.UUCP> <15694@sun.uucp> <6071@mimsy.UUCP> Organization: U. of Calgary, Calgary, Ab. Lines: 34 Keywords: instruction set architectures, strcpy Xref: utgpu comp.arch:766 comp.lang.c:1451 Summary: Unrolling will speed up strcpy In article <6071@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes: > The proper way to speed strcpy() on a MicroVAX-II is no doubt to > use the following assembly code: > > _strcpy:.globl _strcpy > .word 0 # save no registers > movq 4(ap),r1 # get s1 and s2 into r1 and r2 > movl r1,r0 # save s1 > 1: movb (r2)+,(r1)+ # *s1++ = *s2++ > bneq 1b # loop until a zero is moved > ret # return original s1 in r0 > > Note that this is remarkably similar to the compiler's output > for the original code, modified to have the proper return value: > All one can improve on the locc-poor MicroVAX-II is the register > usage and the parameter grabbing. (c2, at least from 32V to 4.3BSD, > will never turn two `movl's into a `movq'. Ah well.) Replacing your loop with: 1: movb (r2)+,(r1)+ bequ 2f movb (r2)+,(r1)+ bneq 1b 2: will almost certainly speed things up (say 15%). I haven't actually tried it, but I've tried entirely analogous cases. Loop unrolling can produce speed-up even when the instruction count is unchanged, if taken branches are replaced by untaken branches. Radford Neal