Path: utzoo!attcan!uunet!husc6!bloom-beacon!mcgill-vision!mouse From: mouse@mcgill-vision.UUCP (der Mouse) Newsgroups: comp.lang.c Subject: Re: "conventional" copy fragment? Message-ID: <1317@mcgill-vision.UUCP> Date: 1 Oct 88 09:38:26 GMT References: <8809092109.AA06071@tycho.yerkes.uchicago.edu> <2585@ingr.UUCP> Organization: McGill University, Montreal Lines: 105 In article <2585@ingr.UUCP>, jones@ingr.UUCP (Mark Jones) writes: > In article <8809092109.AA06071@tycho.yerkes.uchicago.edu>, pearce@TYCHO.YERKES.UCHICAGO.EDU ("Eric C. Pearce") writes: >> Personally, I prefer this "conventional" copy fragment: >> [code] > How about this "conventional" copy fragment > memcpy(array2,array1,sizeof(*A)*Count); As with most of these arguments, it really depends on the specific application. On some machines (such as the VAX), function call overhead is large enough that if you're copying small amounts of data (like ten or twelve bytes), it's faster to inline it. > Use the functions in the C library. If you need more speed, recode > the library routines in assembly, and link them in ahead of the > standard library. Not enough: even the function call overhead can swamp the time taken to perform the copy. > Don't write trashy C code to try to get better speed. It just ain't > worth it, now or later. Welcome to the real world, where we need to get the response packet out to the robot within 28 milliseconds or the whole thing comes to a screeching halt. If inlining a copy makes the difference between a working system and a not-quite-working system, it's worth it. This is not just my idea, either. I wrote a program to find out what's really happening. This was run on an otherwise free MicroVAX-II under 4.3BSD with the standard 4.3 pcc-based cc, with no optimization options enabled. The times given are real times, but they stayed stable across several runs, so they ought to be reasonably good. Here's what the various lines mean: Overhead: The time taken when there's no copy being done (ie, loop overhead). Fancy inline: The "conventional" copy recommended by "Eric C. Pearce", with all inner-loop variables declared register. Simple inline: An inline loop whose core is for (nb=...;nb>0;nb--) *to++ = *from++; where all three variables are declared register. asm inline: In-line asm() directives to implement a loop such as the one used for "Simple inline". Fxncall: Calling a function containing the same loop used by "Simple inline". Library: Calling memcpy(). All pointers are pointers to char, so copying copies one byte at a time. Here are the times resulting from 100000 iterations, copying 12 bytes each loop: Overhead = 2.3 usec/loop Fancy inline = 60.6 usec/loop Simple inline = 49.1 usec/loop asm inline = 40.1 usec/loop Fxncall = 74.3 usec/loop Library = 71.9 usec/loop And here, 10000 iterations, copying 512 bytes each loop: Overhead = 2 usec/loop Fancy inline = 979 usec/loop Simple inline = 1881 usec/loop asm inline = 1654 usec/loop Fxncall = 1877 usec/loop Library = 1794 usec/loop Moral: Nothing beats actually trying a few things and finding out what's best. Of course, "fastest" is not always "best", though if you care enough to start unrolling your block copies, speed is presumably important to you. In particular, note the differences depending on the amount of data being copied, so what you want to do depends on how much stuff you're typically copying. For the curious, the program I used appears below; to extract it, feed this through atob and uncompress. xbtoa Begin +.\KC+HUZ6SS[U1.XYouRoH)*A#Y^r_5GN1;hD=O;?2_Msos$KRF3P @e6kU-a]g:uN.*;mgGt_L4Ihm+o'dl+@0CO:DQ&:%p%9@HL9]'KaP8J.YpXM\]tlJCr?Fc&*5YHH`M Gk&XJGI2&JX?ma+549F3'7"JAn#eY]n(C%_#,-Q'DS71P(8 \Y'2IPn*qf]_,\EIfhj1gp.m)H7K*8NKA#n6sTd1kCGMmb8@s'baP.-7(5n'U;&#VU].hg_@@=Fu.p 1$2SdOYu;kFFAR,+'kaX-K/HsiF!f=(ja8k$?Tr0)M]h2kTPg5Nn0nZR_]9`MIt('UQ$5TfHQ;4VKe kp56AqON]Om&A<0:d)'b%lp&RKpW2U!N*$2r47N&JUEI'1=]!;><#t b@O"16)))M8MB,hjgV_PXO=$fo<80"99c3l`1KS3]YBE\n_f(u/0&R]^S&;+;C5KPF_Yk)HpARDPZn S=7O,\K%-caf1\FRBC_`5[gn5L;9Pg>-ME?URJEe$aZ;W_luT]lbn`j225.:!?2a]Nm='HC9i[>#>n `S\L2R4KcD).X0-a=EtE+5nT5k\0ud8lYlJUbq?"'/7j"Qn#7>Y`70d#.TqFa(@=1dR@Z)f`7ic(VL 34BG=t9ebkbUEqgMV?@GK(bii=sIgVGKPL[0M+(0S4)[O9bL#`c@f)TDX.>^\d,2IuE RKgU,Os&]2Ns-S.Ui%k*#4/s(AhHk$9*8-]K*&;*%!i!qtrX!qRn9-uL;q_T,(q6c;-0@ZL#ogV;"[ 6[Gj_P'Pe8-]_&OF]^Iaq,q&_>GHRK1=nZBai'N&P+F0-j5).X_9pMl;*?1RGl_modLp,d=Z!uTEWtd@O$_C6oRRjNi0N)'@NO0Sk\`3