Path: utzoo!utgpu!bnr-vpa!bnr-rsc!mlord
From: mlord@bnr-rsc.UUCP (Mark Lord)
Newsgroups: comp.arch
Subject: Re: Unaligned Accesses (was Re: How to use silicon)
Message-ID: <844@bnr-rsc.UUCP>
Date: 27 Mar 89 14:55:02 GMT
References: <37196@bbn.COM> <1989Mar16.190043.23227@utzoo.uucp> <24889@amdcad.AMD.COM> <355@bnr-fos.UUCP> <13@microsoft.UUCP> <362@bnr-fos.UUCP> <59@microsoft.UUCP>
Reply-To: mlord@bnr-rsc.UUCP (Mark Lord)
Organization: Bell-Northern Research, Ottawa, Canada
Lines: 82

In article <59@microsoft.UUCP> w-colinp@microsoft.uucp (Colin Plumb) writes:
 [lines from my article deleted]
>
>Yes, but if you have at least one 32-bit pointer per code item, odds are
>very good your other data is much more than 32 bits long.  So again, the
>waste (limited to 3 bytes, usually) is trivial.
Very trivial indeed for a single data item.  Now consider a large number
of such data items, dynamically allocated and maintained in data structures
based on linked lists.  Say.. ten million of them.  This could cost between
10 and 30 megabytes of store, simply because everything was 32-bit aligned.

 [lines from my article deleted]
>
>Actually, it can be done by machine.  A simple working algorithm is to sort
>the elements by size, starting with the largest.  And, as I said, the wasted
>space has an upper bound of (greatest alignment restriction)-(size of smallest
>data item), usually 4 and 1 bytes, respectively.  Sometimes 8 and 1.
>
Okay, no problem.  At least part of the process can be automated.  Be careful,
though, quite often structures are used as sub-fields of other structures,
or are declared in minimal (public) form in interface modules, and then in
extended (private) detail in the actual implementation modules (yes, there
are very high level languages which support this and more..).  Also, there
have been clever programmers who did tricky and nasty things, such as over-
laying one structure with another to achieve certain effects, such as outputing
a stream of bytes to a block device very fast.  Such things can be done in
more portable ways at little speed expense, but remember, this is existing
working code we are talking about, and the tricky programmers may have moved
on to other areas/companies.

Note the case of structures being used as sub-fields of other structures has
potential to increase the alignment overhead (or else add considerable 
compiler overhead/headaches) in some cases, as one would probably want to
ensure correct alignment of both structures.

>If the structures aren't dumped, raw, to a file, I would have no hesitation
>reordering structures in my code.
>
Or used as part of messages which are passed on to other devices/systems..
or overlaid with other structures by nasty code (see above).  Such code is 
not always easy to find in a forest of 14 million lines.  Especially when
*none* of it is *my* code.

 [lines from my article deleted]
>
>Given that that software needs to be recompiled anyway, and is resistant
>to big/little endian changes (i.e. doesn't use disgusting tricks), I
>don't see how it can have much dependency on struct packing.
>
That's right.  Just recompile it and it works.  I like it!  How come real
life is seldom this simple?

 [lines from my article deleted]
>
>CISC processors have exactly the same problems.  They still have to break
>it up into two accesses.  The only problem is that you don't have the
>flexibility.  If the second one fails, the VAX will restart both accesses,
>while a 68020 will only reexecute the second.  In both cases, an arbitrarily
>long time can pass between the first (the first first, in the case of the
>VAX) access and the second.  How is this different from what a RISC chip
>provides?
I didn't actually have data faults in mind here.  Of more concern were 
interrupts occuring between the multiple accesses, with the interrupt code
using or modifying the data in question.  Sloppy programming, by modern
standards, perhaps, but we are talking about not-so-modern code here.

>
>I don't quite understand the problem.  Alignment restrictions are almost
>zero hassle in most code, are enforced by compilers even on processors
>which don't beed them for efficiency reasons (the Microsoft C compiler
>inserts pad bytes in the instruction stream to get the target of a branch
>instruction word-aligned), and simplify the hardware tremendously.
>It seems like win-win.  I never really used what I'm giving up, anyway.
Perhaps I've just had to deal with such problems a little too often over the
past five years.  Thanks for your viewpoint.  Perhaps the world IS mostly sane.

>-- 
>	-Colin (uunet!microsoft!w-colinp)
-Mark
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I made all of this up for my own benefit, not the Company's.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~