Path: utzoo!utgpu!news-server.csri.toronto.edu!clyde.concordia.ca!uunet!mcsun!ukc!dcl-cs!aber-cs!rupert!pcg
From: pcg@cs.aber.ac.uk (Piercarlo Grandi)
Newsgroups: comp.arch
Subject: Re: Is handling off-alignment important? (was Re: RISC hard to program?)
Message-ID: <PCG.90Jul21134457@rupert.cs.aber.ac.uk>
Date: 21 Jul 90 13:44:57 GMT
References: <40088@mips.mips.COM> <2162@opus.cs.mcgill.ca>
	<3648@auspex.auspex.com> <2163@opus.cs.mcgill.ca>
	<104037@convex.convex.com> <1990Jul18.190750.7282@zoo.toronto.edu>
Sender: pcg@aber-cs.UUCP
Organization: Coleg Prifysgol Cymru
Lines: 60
In-reply-to: henry@zoo.toronto.edu's message of 18 Jul 90 19:07:50 GMT


In article <1990Jul18.190750.7282@zoo.toronto.edu> henry@zoo.toronto.edu
(Henry Spencer) writes:

   In article <104037@convex.convex.com> patrick@convex.com (Patrick F.
   McGehearty) writes:

   >Trivial example: consider the std libc bcopy which takes two pointers and a
   >count.  Most machine specific implementations move the data in units larger
   >than a character at time.  Under what conditions should the implementor of
   >this commonly used library worry about checking the alignment of the
   >pointers before starting the copy?

   Essentially always, unless the count is very small.  Even on machines that
   handle misalignment, if the alignment on the two areas is compatible, it
   is better to copy enough initial bytes to align the pointers and then do
   an aligned copy for the bulk of the data.

Doing aligned moves of aligned blocks of storage is a win on most
machines, as Spencer says. Not only a memory copy routine should detect
and exploit the (hopefully fairly common) case where the source and
destination are already naturally aligned, it should also, on machines
that make it easy, try to artificially align the bulk of the copy
operation.

One problem is that when destination and source are be aligned
differently you have to choose whether to align the copy w.r.t. the
source or the destination. It is best to align the destination,
especially on write thru cache machines, and sometimes by a spectacular
margin.

Example: if we have to copy 73 bytes from address 102 to address 251, we
should (assming 4 bytes is the optimal block copy word size):

	split the 73 bytes in three segments, of 3, 17x4=68, 2 bytes.

	copy 3 bytes from address 102 to 245
	copy 17 words from address 105 to address 248
	copy 2 bytes from address 173 to address 316

Note that the word by word copy has a source that is not word aligned,
but the destination is. Many machines can cope with unaligned fetches
fairly well, but unaligned stores are usually catastrophic.

My usual example is the VAX-11/780, which had an 8 bytes buffer
between the CPU and the system bus leading to memory, and write thru.
Each byte written could cause an 8 byte read from memory, and an 8 byte
write back to memory, ...

As Spencer and myself have already remarked, this means that a suitable
sw memory copy operation can easily beat hardware memory copies, for
suitably large copy sizes, and by a large margin.

Yet another reason for having simple CPUs and avoid microprograms (if
you can afford the instruction fetch bandwidth, or use compact
instruction encodings, e.g. a stack architecture).
--
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk