Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!csd4.milw.wisc.edu!lll-winken!uunet!microsoft!w-colinp
From: w-colinp@microsoft.UUCP (Colin Plumb)
Newsgroups: comp.arch
Subject: Re: Unaligned Accesses (was Re: How to use silicon)
Message-ID: <59@microsoft.UUCP>
Date: 25 Mar 89 02:39:54 GMT
References: <37196@bbn.COM> <1989Mar16.190043.23227@utzoo.uucp> <24889@amdcad.AMD.COM> <355@bnr-fos.UUCP> <13@microsoft.UUCP> <362@bnr-fos.UUCP>
Reply-To: w-colinp@microsoft.uucp (Colin Plumb)
Organization: very little
Lines: 60

mlord@bnr-public.UUCP (Mark Lord) wrote:
> The "break into multiple aligned arrays" approach can work fine as long
> as one is dealing with arrays (of structures) to begin with.  Unfortunately,
> I suspect that this is not of much use with dynamically allocated linked
> lists of structures, which is very likely when we are talking about multiple
> megabytes of data.

Yes, but if you have at least one 32-bit pointer per code item, odds are
very good your other data is much more than 32 bits long.  So again, the
waste (limited to 3 bytes, usually) is trivial.

> Granted, by careful (human) reordering of the fields within a complex
> structure, one can almost always reduce wasted space to less than one
> machine word/line per allocated item.  This is a good thing that we all do
> when possible on new code, but it is much more difficult (and risky)
> to go back and try to do this with code in a very large existing base
> of (otherwise) good, working software.

Actually, it can be done by machine.  A simple working algorithm is to sort
the elements by size, starting with the largest.  And, as I said, the wasted
space has an upper bound of (greatest alignment restriction)-(size of smallest
data item), usually 4 and 1 bytes, respectively.  Sometimes 8 and 1.

If the structures aren't dumped, raw, to a file, I would have no hesitation
reordering structures in my code.

> The debate for devoting more silicon space to support efficient handling
> of misaligned accesses seems to hinge around supporting the huge amounts
> of perfectly good software that already exist, written before programmers
> in general became keenly aware of alignment/efficiency tradeoffs.

Given that that software needs to be recompiled anyway, and is resistant
to big/little endian changes (i.e. doesn't use disgusting tricks), I
don't see how it can have much dependency on struct packing.

> I feel that this must be addressed by chip designers NOW.  The MIPS R2000
> R3000 family have included instructions for this purpose, LWL/LWR, SWL/SWR,
> but these must usually be used in pairs to achieve the desired effect.
> A big problem with such is atomicity of load/store operations, especially
> nasty when the software in question is also being used on one or more CISC
> machines which never exhibit this problem.  Still, MIPS has at least
> provided a starting point for misaligned accesses, while other chip makers
> have yet to address the problem at all (ie. MC88100).

CISC processors have exactly the same problems.  They still have to break
it up into two accesses.  The only problem is that you don't have the
flexibility.  If the second one fails, the VAX will restart both accesses,
while a 68020 will only reexecute the second.  In both cases, an arbitrarily
long time can pass between the first (the first first, in the case of the
VAX) access and the second.  How is this different from what a RISC chip
provides?

I don't quite understand the problem.  Alignment restrictions are almost
zero hassle in most code, are enforced by compilers even on processors
which don't beed them for efficiency reasons (the Microsoft C compiler
inserts pad bytes in the instruction stream to get the target of a branch
instruction word-aligned), and simplify the hardware tremendously.
It seems like win-win.  I never really used what I'm giving up, anyway.
-- 
	-Colin (uunet!microsoft!w-colinp)