Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!brl-adm!seismo!lll-lcc!ames!oliveb!sun!gorodish!guy From: guy@gorodish.UUCP Newsgroups: comp.arch,comp.lang.c Subject: Re: String Handling -- Incompetence of run-time libraries Message-ID: <15915@sun.uucp> Date: Tue, 31-Mar-87 05:22:43 EST Article-I.D.: sun.15915 Posted: Tue Mar 31 05:22:43 1987 Date-Received: Thu, 2-Apr-87 01:13:02 EST References: <15292@amdcad.UUCP> <978@ames.UUCP> <15694@sun.uucp> <1530@husc6.UUCP> Sender: news@sun.uucp Lines: 62 Keywords: instruction set architectures, strcpy Xref: utgpu comp.arch:723 comp.lang.c:1397 > This made me curious, and I started playing around with alternate versions of > strcpy. I did some timings on a microVAX II (cc -O, 4.3BSD) using the > library strcpy... Note that the library "strcpy" uses "locc" to find the length of the source string and then does a "movc3" to copy it. This requires two passes over the source string. Whether the whizzo VAX string twiddling instructions are a win or not depends on how long the strings are. (Also, which of the whizzo string instructions does the microVAX II implement in microcode and which are handled in software?) > I also ran some tests on a SUN-3/180 (SUN UNIX 4.2 Which release? The "UNIX 4.2" is a conventional phrase. The 4.2 refers to 4.2BSD; it's not the release number. > routine inline? time (sec) > strcpy no 22.0 > strcpy2 no 21.1 This is almost certainly not significant. The code to the SunOS 3.0 version of "strcpy": char * strcpy(s1, s2) register char *s1, *s2; { register char *os1; os1 = s1; while (*s1++ = *s2++) ; return (os1); } The only difference is that yours doesn't return a pointer to the original string (which it has to if it's to be compatible), so the differences are almost certainly insignificant. The SunOS 3.2 version is: ENTRY(strcpy) movl PARAM,a0 | s1 movl PARAM2,a1 | s2 movl a0,d0 | return s1 at the end moveq #-1,d1 | count of 65535 | The following loop runs in loop mode on 68010 1$: movb a1@+,a0@+ | move byte from s2 to s1 dbeq d1,1$ bne 1$ | if zero byte seen, done RET which is more-or-less the same thing, only using the 68010's moral equivalent of whizzo string instructions. In the cases I tested it on, it was faster than the C version. (Thanks to John Gilmore and Vaughan Pratt for the little "bne 1$" trick at the end there.) Unrolling the loop, as you did, might be a bigger win, especially on the 68020 where even the unrolled loop would probably fit in the instruction cache.