Path: utzoo!utgpu!news-server.csri.toronto.edu!clyde.concordia.ca!uunet!mcsun!ukc!dcl-cs!aber-cs!rupert!pcg From: pcg@cs.aber.ac.uk (Piercarlo Grandi) Newsgroups: comp.arch Subject: Re: Is handling off-alignment important? (was Re: RISC hard to program?) Message-ID: Date: 21 Jul 90 13:44:57 GMT References: <40088@mips.mips.COM> <2162@opus.cs.mcgill.ca> <3648@auspex.auspex.com> <2163@opus.cs.mcgill.ca> <104037@convex.convex.com> <1990Jul18.190750.7282@zoo.toronto.edu> Sender: pcg@aber-cs.UUCP Organization: Coleg Prifysgol Cymru Lines: 60 In-reply-to: henry@zoo.toronto.edu's message of 18 Jul 90 19:07:50 GMT In article <1990Jul18.190750.7282@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: In article <104037@convex.convex.com> patrick@convex.com (Patrick F. McGehearty) writes: >Trivial example: consider the std libc bcopy which takes two pointers and a >count. Most machine specific implementations move the data in units larger >than a character at time. Under what conditions should the implementor of >this commonly used library worry about checking the alignment of the >pointers before starting the copy? Essentially always, unless the count is very small. Even on machines that handle misalignment, if the alignment on the two areas is compatible, it is better to copy enough initial bytes to align the pointers and then do an aligned copy for the bulk of the data. Doing aligned moves of aligned blocks of storage is a win on most machines, as Spencer says. Not only a memory copy routine should detect and exploit the (hopefully fairly common) case where the source and destination are already naturally aligned, it should also, on machines that make it easy, try to artificially align the bulk of the copy operation. One problem is that when destination and source are be aligned differently you have to choose whether to align the copy w.r.t. the source or the destination. It is best to align the destination, especially on write thru cache machines, and sometimes by a spectacular margin. Example: if we have to copy 73 bytes from address 102 to address 251, we should (assming 4 bytes is the optimal block copy word size): split the 73 bytes in three segments, of 3, 17x4=68, 2 bytes. copy 3 bytes from address 102 to 245 copy 17 words from address 105 to address 248 copy 2 bytes from address 173 to address 316 Note that the word by word copy has a source that is not word aligned, but the destination is. Many machines can cope with unaligned fetches fairly well, but unaligned stores are usually catastrophic. My usual example is the VAX-11/780, which had an 8 bytes buffer between the CPU and the system bus leading to memory, and write thru. Each byte written could cause an 8 byte read from memory, and an 8 byte write back to memory, ... As Spencer and myself have already remarked, this means that a suitable sw memory copy operation can easily beat hardware memory copies, for suitably large copy sizes, and by a large margin. Yet another reason for having simple CPUs and avoid microprograms (if you can afford the instruction fetch bandwidth, or use compact instruction encodings, e.g. a stack architecture). -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk