Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!tut.cis.ohio-state.edu!uc!cs.umn.edu!thelake!steve From: steve@thelake.mn.org (Steve Yelvington) Newsgroups: comp.lang.c Subject: Re: Re: A study in code optimization in C Message-ID: Date: 31 Jul 90 20:28:24 GMT References: <133@smds.UUCP> <1990Jul26.144134.16053@ux1.cso.uiuc.edu> <1349@proto.COM> <1990Jul28.203800.17258@laguna.ccsf.caltech.edu> Organization: Otter Lake Leisure Society Lines: 31 X-Member-Of: STdNET X-Bad-Pun: There's no place like Nome for the Hollandaise. [In article <1990Jul28.203800.17258@laguna.ccsf.caltech.edu>, bruce@seismo.gps.caltech.edu (Bruce Worden) writes ... ] > In general, I'd say Richard's code does a pretty good job when moving int's, > and also when compared to young machines (the BBN and the Meiko i860.) > In addition, his code is about 20% faster than a simple "for" loop on my > Sparc 1+, so it illustrates a useful principle as well. I intend to > use it in some selected applications, thanks for posting it. Bruce is one of the few people who seems to have seen the point -- which (to me, anyway) was just an illustration of C coding technique, not a claim that it's possible to beat Brand X compiler's mondo-optimized assembler memcpy(). For collectors of useless numbers, here are results from an 8-megaHertz 16-bit Motorola 68000 (Atari ST), 1000 iterations, 20K buffers: library memcpy: 17.125 seconds Richard's gencpy char: 40.755 seconds Richard's gencpy int: 20.385 seconds Richard's gencpy long: 15.460 seconds Details: Sozobon C compiler, dLibs public-domain C library. Optimizer turned on. sizeof(int) == 16 bits; sizeof(long) == 32 bits. The dLibs memcpy is coded in assembler, moves 16-bit words when possible, and DOES check for overlaps (as in memmove). The copy is a simple loop. Loading and dumping registers with movem.l might be faster; I have not tried. -- Steve Yelvington at the (rain-replenished) lake in Minnesota steve@thelake.mn.org