Path: utzoo!attcan!uunet!cs.utexas.edu!sdd.hp.com!usc!jarthur!nntp-server.caltech.edu!seismo.gps.caltech.edu!bruce From: bruce@seismo.gps.caltech.edu (Bruce Worden) Newsgroups: comp.lang.c Subject: Re: A study in code optimization in C Summary: Some statistics for various machines Keywords: memcopy Message-ID: <1990Jul28.203800.17258@laguna.ccsf.caltech.edu> Date: 28 Jul 90 20:38:00 GMT References: <133@smds.UUCP> <1990Jul26.144134.16053@ux1.cso.uiuc.edu> <1349@proto.COM> Sender: bruce@seismo.gps.caltech.edu (Bruce Worden) Organization: Seismological Laboratory, California Institute of Technology, CA Lines: 100 In article <1349@proto.COM> joe@proto.COM (Joe Huffman) writes: >In article <1990Jul26.144134.16053@ux1.cso.uiuc.edu>, mcdonald@aries.scs.uiuc.edu (Doug McDonald) writes: >> In article <133@smds.UUCP> rh@smds.UUCP (Richard Harter) writes: >> > >> >The macro shown below is an optimized memory to memory copy macro. >> >It is probably faster than memcopy on your machine -- I have checked >> >it on several machines and have always found it to be faster. >> !!!!!! >> Oh My!. >> Time on my computer, in seconds, for 1000 copies of a 20 kilobyte array: >> His code library memcpy >> Compiler 1: >> (chars) 12.6 2.7 >> (ints) 6.9 2.7 >> Compiler 2: >> (chars) 23.6 1.3 >> (ints) 6.9 1.3 >[Stuff deleted... compilers were Microsoft and Microway NDPC, machine was >20 MHz 386] > >I just ran it on a 20 MHz 386 running SCO UNIX. The timing were done with >5000 copies but then divided by 5 to make the numbers comparable. > His code library memcpy >SCO supplied MSC 5.1 > (chars) 14.0 2.05 >Zortech > 386 code generator not available 1.80 Here are the results on some machines I could find the other day. The compilers are the native compilers unless otherwise stated. I used whatever compiler optimizations I could. 20kbyte arrays, 1000 copies: Sun Sparcstation 1+ Him memcpy chars: 7.6 2.0 ints: 2.0 2.0 Sun 4/280 Him memcpy chars: 9.8 2.8 ints: 2.5 2.8 Sun Sparcstation SLC Him memcpy chars: 9.9 2.6 ints: 2.5 2.6 Sun 386i Him memcpy chars: 9.5 2.6 ints: 2.4 2.6 Sun 3/160 Him memcpy chars: 13.7 4.5 ints: 3.4 4.5 Inmos T800 (Meiko, 25MHz, kind-of unfair because of block_copy instruction) Him memcpy chars: 37.6 1.6 ints: 8.4 1.6 i860 (Meiko, 40MHz, Green Hills C-I860 1.8.5, beta assembler 1.41, beta linker 1.2) Him memcpy chars: 2.1 3.9 ints: 0.9 3.9 Convex C120 (Vector--yes his code vectorizes nicely, memcpy not available, used bcopy) Him memcpy chars: 3.0 1.0 ints: 1.0 1.0 Convex C120 (Scalar, memcpy not available, used bcopy) Him memcpy chars: 28.4 1.5 ints: 7.5 1.5 BBN TC2000 (Motorola 88000-based, Green Hills C-88000 2.35(1.8.4)) Him memcpy chars: 10.3 12.0 ints: 4.9 12.0 In general, I'd say Richard's code does a pretty good job when moving int's, and also when compared to young machines (the BBN and the Meiko i860.) In addition, his code is about 20% faster than a simple "for" loop on my Sparc 1+, so it illustrates a useful principle as well. I intend to use it in some selected applications, thanks for posting it. BIG TIME DISCLAIMER: I in no way intended this to be a comparison of different machines, but of the performance of a piece of C code on each of several different machines. There are a lot of ways to do timings, and most of them aren't very good, so please don't flame me if I didn't do justice to some machine's absolute performance, it is the relative timings that matter. If I screwed that up, flame away (though a nice note explaining the error might be more instructive.) Bruce P.S. For timing I used getusecclock() on the BBN, ticks() on the Meiko's, and getrusage() on everything else.