Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!purdue!mentor.cc.purdue.edu!j.cc.purdue.edu!pur-ee!hankd From: hankd@pur-ee.UUCP (Hank Dietz) Newsgroups: comp.arch Subject: Sorting struct members for alignment (was Re: Unaligned Accesses) Summary: Compiler technology nasty for C Keywords: alignment Message-ID: <11118@pur-ee.UUCP> Date: 27 Mar 89 18:55:42 GMT References: <37196@bbn.COM> <1989Mar16.190043.23227@utzoo.uucp> <24889@amdcad.AMD.COM> <355@bnr-fos.UUCP> <13@microsoft.UUCP> <362@bnr-fos.UUCP> <59@microsoft.UUCP> Reply-To: hankd@pur-ee.UUCP (Hank Dietz) Organization: Purdue University Engineering Computer Network Lines: 41 In article <59@microsoft.UUCP> w-colinp@microsoft.uucp (Colin Plumb) writes: >> Granted, by careful (human) reordering of the fields within a complex >> structure, one can almost always reduce wasted space to less than one >> machine word/line per allocated item. This is a good thing that we all do >> when possible on new code, but it is much more difficult (and risky) >> to go back and try to do this with code in a very large existing base >> of (otherwise) good, working software. > >Actually, it can be done by machine. A simple working algorithm is to sort >the elements by size, starting with the largest. And, as I said, the wasted >space has an upper bound of (greatest alignment restriction)-(size of smallest >data item), usually 4 and 1 bytes, respectively. Sometimes 8 and 1. > >If the structures aren't dumped, raw, to a file, I would have no hesitation >reordering structures in my code. All true. In the compilers courses I teach, I've been mentioning structure member re-arrangement as an integral part of data structure allocation within a compiler. It works really well for languages like Pascal, however, there is a big problem with this for C code. In C, it has been a very common idiom to ASSUME that the address of the first member of a struct IS the address of the struct. For example: struct s { char c; double d; } t; char *p = &t; Now this might get you a warning message (types don't match), but the loose usage is very common in real programs, and is often hidden in a function: a = f(&t); where f expects its argument to be a char *. Using (nasty) global flow analysis, this use can be detected and even mechanically corrected in MOST cases... but is it worth it? Opinions? -hankd@ee.ecn.purdue.edu