Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!husc6!endor!reiter From: reiter@endor.UUCP Newsgroups: comp.arch,comp.lang.c Subject: Re: String Handling -- Incompetence of run-time libraries Message-ID: <1530@husc6.UUCP> Date: Mon, 30-Mar-87 17:49:39 EST Article-I.D.: husc6.1530 Posted: Mon Mar 30 17:49:39 1987 Date-Received: Wed, 1-Apr-87 01:36:22 EST References: <15292@amdcad.UUCP> <978@ames.UUCP> <15694@sun.uucp> <18036@ucbvax.BERKELEY.EDU> <1944@hoptoad.uucp> Sender: news@husc6.UUCP Reply-To: reiter@harvard.UUCP (Ehud Reiter) Organization: Aiken Computation Lab Harvard, Cambridge, MA Lines: 65 Keywords: instruction set architectures, strcpy Xref: utgpu comp.arch:711 comp.lang.c:1386 In article <1944@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) pointed out that printf could be recoded to run much faster. This made me curious, and I started playing around with alternate versions of strcpy. I did some timings on a microVAX II (cc -O, 4.3BSD) using the library strcpy, a straightforward version of my own (strcpy2), and a version which tried to minimize branches out of the instruction sequence (strcpy3). I ran strcpy2 and strcpy3 both as procedure calls and as in-line expansions. The timings, for 1,000,000 executions of copying "Test string 1", were routine inline? time (sec) strcpy no 134.7 strcpy2 no 65.2 strcpy3 no 51.6 strcpy2 yes 43.3 strcpy3 yes 30.8 The most shocking thing about the above is that the library strcpy is half the speed of a very straightforward C implementation of the routine! Perhaps we should spend less time arguing about hardware support for string-handling routines, and more time making sure that the people who implement the run-time library are somewhat competent. It also certainly appears that in-line expansion of simple library routines is a big win. I also ran some tests on a SUN-3/180 (SUN UNIX 4.2, cc -O). The results were: routine inline? time (sec) strcpy no 22.0 strcpy2 no 21.1 strcpy3 no 20.1 strcpy2 yes 15.1 strcpy3 yes 13.7 The SUN library routines seem much more competently coded, but even here, perhaps a bit of tuning would help (note that my timing was pretty sloppy, so small differences may not be significant), and in-line expansion would certainly be a big win. Routine codes: strcpy2(ss1,ss2) char *ss1,*ss2; { register char *s1,*s2; s1 = ss1; s2 = ss2; while (*s1++ = *s2++); } strcpy3(ss1,ss2) char *ss1,*ss2; { register char *s1,*s2; s1 = ss1; s2 = ss2; while (*s1++ = *s2++) { if (!(*s1++ = *s2++)) break; if (!(*s1++ = *s2++)) break; < the above statement is repeated 20 times > } } *************************************************************************** Ehud Reiter reiter@harvard (ARPA, BITNET, UUCP)