Path: utzoo!utgpu!bnr-vpa!bnr-fos!bnr-public!mlord
From: mlord@bnr-public.uucp (Mark Lord)
Newsgroups: comp.arch
Subject: Unaligned Accesses (was Re: How to use silicon)
Message-ID: <362@bnr-fos.UUCP>
Date: 22 Mar 89 17:13:31 GMT
References: <37196@bbn.COM> <1989Mar16.190043.23227@utzoo.uucp> <24889@amdcad.AMD.COM> <355@bnr-fos.UUCP> <13@microsoft.UUCP>
Sender: news@bnr-fos.UUCP
Reply-To: mlord@bnr-public.UUCP (Mark Lord)
Organization: Bell-Northern Research, Ottawa, Canada
Lines: 40

w-colinp@microsoft.uucp (Colin Plumb) writes:
>In any structure, if you rearrange the components, you can lose at most
>n-1 bytes to padding, where n is the strictest alignment restriction.  For
>most processors, the worst case is a double and a char, 7 bytes out of 16
>wasted.  But if this is a major concern, rewrite the code to use two parallel
>arrays.  You'll waste at most 7 bytes total (in your 100Meg).
>

The "break into multiple aligned arrays" approach can work fine as long
as one is dealing with arrays (of structures) to begin with.  Unfortunately,
I suspect that this is not of much use with dynamically allocated linked
lists of structures, which is very likely when we are talking about multiple
megabytes of data.

Granted, by careful (human) reordering of the fields within a complex
structure, one can almost always reduce wasted space to less than one
machine word/line per allocated item.  This is a good thing that we all do
when possible on new code, but it is much more difficult (and risky)
to go back and try to do this with code in a very large existing base
of (otherwise) good, working software.

The debate for devoting more silicon space to support efficient handling
of misaligned accesses seems to hinge around supporting the huge amounts
of perfectly good software that already exist, written before programmers
in general became keenly aware of alignment/efficiency tradeoffs.

I feel that this must be addressed by chip designers NOW.  The MIPS R2000
R3000 family have included instructions for this purpose, LWL/LWR, SWL/SWR,
but these must usually be used in pairs to achieve the desired effect.
A big problem with such is atomicity of load/store operations, especially
nasty when the software in question is also being used on one or more CISC
machines which never exhibit this problem.  Still, MIPS has at least
provided a starting point for misaligned accesses, while other chip makers
have yet to address the problem at all (ie. MC88100).

-Mark
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*  The Company doesn't even know I'm reading this stuff, let alone  *
*  writing it  (maybe THAT's why I'm not working on it anymore!).   *
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~