Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!zaphod.mps.ohio-state.edu!samsung!crackers!jjmhome!smds!rh From: rh@smds.UUCP (Richard Harter) Newsgroups: comp.lang.c Subject: Memory copy timings Message-ID: <144@smds.UUCP> Date: 2 Aug 90 08:01:08 GMT Organization: SMDS Inc., Concord, MA Lines: 111 A number of memcpy versus "his macro" results have posted. As far I can recall all were large block moves (5K - 20K). None of the postings mentioned checking unaligned moves, i.e. situtations where the destination and/or source do not lie on even word boundaries. This is not entirely realistic. If you are going to use a memory copy routine (and you should) you are going to use it to copy short blocks as well as long; if you are going to copy character strings it will often be the case that they are unaligned. I ran an experiment on four machines a generic 386, a SUN 3/50, a Mac II/cx running AUX, and a tekronix XD 88/10. On each of the four machines I copied 100,000,000 bytes. I set up three cases, (a) aligned character moves, (b) unaligned character moves, and (c) integer moves. In all cases I used the maximum optimization available (none of postings mentioned whether they had optimization turned on.) In each case I used six different blocks ranging from 10-1000. (larger block sizes are dominated by the inner loop -- shorter blocks have sundry overhead costs.) Block sizes for ints were 4 times larger than block sizes for characters (experiment design flaw.) The following tables have six lines, one for each block size. Column 1 is the block size, columns 2 and 3 are the times for aligned character moves for the macro and memcpy respectively, columns 4 and 5 are the times for unaligned character moves, and columns 6 and 7 are the times for integer moves. In each case the times are the times to move 100,000,000 bytes. Results and comments follow: 386 Timings -- Esix operating system 10 187 88 188 88 51 31 25 131 40 134 42 39 16 50 121 30 120 30 36 14 100 115 16 117 29 27 10 250 106 13 107 22 27 7 1000 90 9 86 19 26 6 Comments: The 386 has a hardware block move instruction. Hardware beats software hands down. Clearly one wants to use memcpy, even for very short copies. Enough said. SUN 3/50 OS 3.5 10 154 199 154 219 39 62 25 115 93 115 149 29 40 50 105 60 105 127 26 33 100 99 42 99 116 25 29 250 98 38 98 110 24 28 1000 93 28 93 105 23 26 Comments: The SUN memcpy apparently checks for alignment and switches to word moves when alignment is right; however it doesn't apparently doesn't use loop unrolling. The macro doesn't check alignment. You could add code to check alignment, but I don't see a clean, portable way to do it. Is it worth using the macro? It's debatable. If you use it for ints, short char moves, and known unaligned moves you buy 10-40%. On the other hand using memcpy saves thought and maintenance costs and it will be superior when and if SUN optimizes the routine. This is a tradeoff situtation. MACINTOSH IICX AUX 2.0 10 134 186 130 183 33 127 25 101 137 97 134 25 114 50 92 121 89 117 23 107 100 88 113 84 110 23 107 250 84 109 80 105 21 106 1000 82 106 79 102 20 106 Comments: AUX is a young OS. One suspects that mempcy is two lines of C. If performance is an issue, you might well consider rolling your own copy routine. Tektronix XD88/10 -- Greenhills C-88000 1.8.4 10 44 21 44 37 11 8 25 39 12 39 32 10 5 50 38 8 38 30 10 4 100 38 6 38 28 10 4 250 38 4 38 28 9 4 1000 37 4 37 28 9 4 Comments: Greenhills has a very good reputation; from these timings it appears warranted. Memcpy is the winner here by a clear margin. An interesting point here is that optimization in a compiled language depends in part on helping the compiler produce efficient code. The arrangement of code gives the compiler information. The cited macro is basically a CISC optimization; compilers for RISC machines probably need information that the macro does not supply. ---- Conclusions: Memcpy is safe, portable (mostly), and doesn't involve any maintenance issues. On many machines it will be faster than any thing you can code. It should be; the systems people can do anything that you can do plus machine-code specific optimizations that you don't have access to. However it is clear that the quality of the implementation of system utilities varies a great deal. If performance is an important issue (or you have a system without memcpy or equivalent) you may want to write your own. Enough on this topic. -- Richard Harter, Software Maintenance and Development Systems, Inc. Net address: jjmhome!smds!rh Phone: 508-369-7398 US Mail: SMDS Inc., PO Box 555, Concord MA 01742 This sentence no verb. This sentence short. This signature done.