Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!cbatt!decuac!cvl!mimsy!chris From: chris@mimsy.UUCP Newsgroups: comp.arch,comp.lang.c Subject: Re: String Handling -- Incompetence of run-time libraries Message-ID: <6071@mimsy.UUCP> Date: Wed, 1-Apr-87 13:45:05 EST Article-I.D.: mimsy.6071 Posted: Wed Apr 1 13:45:05 1987 Date-Received: Sat, 4-Apr-87 07:32:31 EST References: <15292@amdcad.UUCP> <978@ames.UUCP> <15694@sun.uucp> <9996@sri-spam.istc.sri.com> Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742 Lines: 81 Keywords: instruction set architectures, strcpy Xref: utgpu comp.arch:745 comp.lang.c:1421 [To comp.arch readers who just want to read about architectures: skip this. It is just correcting various points in other articles, none of which had much to do with architecture in the first place.] In some article, someone adds these arrows and notes: >>$strcpy2(ss1,ss2) >>$char *ss1,*ss2;<---------------------------- put "register" on this line >>${ register char *s1,*s2;<-------------- as many compilers will use this >>$ as local variables overriding >>$ s1 = ss1; the arguments. [etc] In article <9996@sri-spam.istc.sri.com> robert@sri-spam.istc.sri.com (Robert Allen) writes: >To further speed this up you could write it as: > strcpy2(ss1,ss2) > register char *ss1, *ss2; > { > while (*s1++ = *s2++) > ; > } >thus eliminating two assignments and two local variables. Now these people all seem to be talking about Vaxen; I feel I must point out the various errors here. It is clear that neither the original someone nor Robert Allen actually compiled these to assembly. (Use `cc -O -S' to do this.) First, the notes attached to the arrows are wrong. Compilers cannot override the arguments when the names differ. The loop copies from s2 to s1; the parameters were ss2 and ss1. Second, the code produced for the second example (once corrected to `while (*ss1++ = *ss2++)') is *identical* to the code for the first! (Vax running 32V, 3BSD, 4BSD, and, no doubt, Sys3, Sys5, V8, and V9.) The two assignments the second version attempts to eliminate must still occur, for `ss1' and `ss2' are passed on the stack, and must be copied into the two registers---the same two registers allocated in the first version. The proper way to speed strcpy() on a MicroVAX-II is no doubt to use the following assembly code: _strcpy:.globl _strcpy .word 0 # save no registers movq 4(ap),r1 # get s1 and s2 into r1 and r2 movl r1,r0 # save s1 1: movb (r2)+,(r1)+ # *s1++ = *s2++ bneq 1b # loop until a zero is moved ret # return original s1 in r0 Note that this is remarkably similar to the compiler's output for the original code, modified to have the proper return value: char * strcpy(ss1, ss2) char *ss1; char *ss2; { register char *s1 = ss1, *s2 = ss2; while ((*s1++ = *s2++) != 0) /* void */; return (ss1); /* must return the original value */ } .globl _strcpy _strcpy: .word 0xc00 # save r11 and r10 movl 4(ap),r11 # here is s1 movl 8(ap),r10 # and s2 L16: movb (r10)+,(r11)+ # *s1++ = *s2++ jneq L16 # (this assembles to a `bneq') movl 4(ap),r0 # return the original s1 ret All one can improve on the locc-poor MicroVAX-II is the register usage and the parameter grabbing. (c2, at least from 32V to 4.3BSD, will never turn two `movl's into a `movq'. Ah well.) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) UUCP: seismo!mimsy!chris ARPA/CSNet: chris@mimsy.umd.edu