Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!tut.cis.ohio-state.edu!uc!cs.umn.edu!thelake!steve
From: steve@thelake.mn.org (Steve Yelvington)
Newsgroups: comp.lang.c
Subject: Re: Re: A study in code optimization in C
Message-ID: <A2072777983@thelake.mn.org>
Date: 31 Jul 90 20:28:24 GMT
References: <133@smds.UUCP> <1990Jul26.144134.16053@ux1.cso.uiuc.edu> <1349@proto.COM> <1990Jul28.203800.17258@laguna.ccsf.caltech.edu>
Organization: Otter Lake Leisure Society
Lines: 31
X-Member-Of:  STdNET
X-Bad-Pun:    There's no place like Nome for the Hollandaise.

[In article <1990Jul28.203800.17258@laguna.ccsf.caltech.edu>,
     bruce@seismo.gps.caltech.edu (Bruce Worden) writes ... ]

> In general, I'd say Richard's code does a pretty good job when moving int's,
> and also when compared to young machines (the BBN and the Meiko i860.)
> In addition, his code is about 20% faster than a simple "for" loop on my
> Sparc 1+, so it illustrates a useful principle as well.  I intend to
> use it in some selected applications, thanks for posting it.

Bruce is one of the few people who seems to have seen the point -- which
(to me, anyway) was just an illustration of C coding technique, not a
claim that it's possible to beat Brand X compiler's mondo-optimized
assembler memcpy().

For collectors of useless numbers, here are results from an 8-megaHertz
16-bit Motorola 68000 (Atari ST), 1000 iterations, 20K buffers:

library memcpy:           17.125 seconds
Richard's gencpy char:    40.755 seconds
Richard's gencpy int:     20.385 seconds
Richard's gencpy long:    15.460 seconds

Details: Sozobon C compiler, dLibs public-domain C library.
Optimizer turned on. sizeof(int) == 16 bits; sizeof(long) == 32 bits.

The dLibs memcpy is coded in assembler, moves 16-bit words when possible,
and DOES check for overlaps (as in memmove). The copy is a simple loop.
Loading and dumping registers with movem.l might be faster; I have not tried.
-- 
   Steve Yelvington at the (rain-replenished) lake in Minnesota
   steve@thelake.mn.org