Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!samsung!crackers!jjmhome!smds!rh
From: rh@smds.UUCP (Richard Harter)
Newsgroups: comp.lang.c
Subject: Re: Memory copy timings
Summary: char moves more likely to be unaligned?
Message-ID: <145@smds.UUCP>
Date: 5 Aug 90 05:11:05 GMT
References: <144@smds.UUCP> <3510@goanna.cs.rmit.oz.au>
Organization: SMDS Inc., Concord, MA
Lines: 52

In article <3510@goanna.cs.rmit.oz.au>, ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
> In article <144@smds.UUCP>, rh@smds.UUCP (Richard Harter) writes:
> > A number of memcpy versus "his macro" results have posted.  ...
> > None of the postings mentioned checking unaligned moves, ...
> > This is not entirely realistic.

> I don't see why.  Whatever method you are using for doing bulk
> moves (BLT instruction, TRT instruction, bcopy(), memcyp(),
> memmove(), ...) having things aligned is likely to help.  Any
> programmer who cares enough about the performance of block transfer
> to be wondering whether memcpy() is fast enough should really have
> taken care of alignment first.

Agreed -- alignment helps and anyone concerned about performance should
worry about alignment.  The point is that if one is copying a block of
characters as such, e.g. strings, one quite regularly hits unaligned
moves.  An example would be copying a substring out of an array (or into
an array).

> This also applies to fread(), fwrite(), and (on systems with a POSIX
> interface, such as DEC are promising RSN for VMS) read() and write().
> On all the UNIX systems where I've tried the comparison, I've found
> that making read/write buffers be "well" aligned (the alignment that
> malloc() guarantees is fine) was usefully faster than having them be
> misaligned.

That is almost guaranteed to be the case.  If I am not mistaken almost
all systems read fixed block sizes into internal buffers and copy
the results from the buffers to your specified destination.  (Waiting
to be told that I am wrong. :-))  There is a rumour to the effect that
your I/O will be faster if you read and write in fixed block sizes that
are integral multiples of the system block size.  Does anyone have 
opinions or information on the trade offs involved?  For example is there
a performance gain one way or another by doing one read into a buffer of
1024 bytes and then copying out a series of items versus doing a series
of reads for each item?  Does anyone have any data on this?  Does anyone
care? :-)

> Most of the time, the best way to speed up block transfer is not to
> do it at all, but to twiddle your pointers around...

Yep, no argument here.  There are cases where that doesn't apply, e.g.
the data you want is in a transient area, you're changing object size,
you're going to modify the copied data later on, etc.  Another area where
there are large potential gains is the allocation and deallocation of objects
of the same type.  One often wins big by maintaining a free list of objects.
And so on...
-- 
Richard Harter, Software Maintenance and Development Systems, Inc.
Net address: jjmhome!smds!rh Phone: 508-369-7398 
US Mail: SMDS Inc., PO Box 555, Concord MA 01742
This sentence no verb.  This sentence short.  This signature done.