Xref: utzoo comp.arch:14137 comp.lang.c:26179 Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!uunet!mcsun!unido!mikros!mwtech!martin From: martin@mwtech.UUCP (Martin Weitzel) Newsgroups: comp.arch,comp.lang.c Subject: Re: RISC Machine Data Structure Word Alignment Problems? Keywords: risc sun Message-ID: <645@mwtech.UUCP> Date: 21 Feb 90 14:35:19 GMT References: <111@melpar.UUCP> <1990Jan21.224826.1699@esegue.segue.boston.ma.us> <1925@l.cc.purdue.edu> Reply-To: martin@mwtech.UUCP (Martin Weitzel) Organization: MIKROS Systemware, Darmstadt/W-Germany Lines: 130 There were some recent postings, that pointed out/complained about 'holes' in C-struct definitions. I hope it is to the benefit of some readers, to explain an alternate point of view of C-struct-s and give some advice how to access a certain byte-layout in memory in a portable (nevertheless painless) way, which avoid struct-s completly. Because the latter may be of more interest, I'll come to it first. Suppose, you have some library function 'getmsg' you supply with the adresse of a buffer and when the function returns it has the buffer filled with the following information: 2 Byte Integer - length of message 1 Byte - several flag bits 1 Byte - type of message 4 Byte Integer - checksum 100 Byte - arbitrary message Many C-Programmers now think about defining the following struct m { short m_length; unsigned char m_flags; char m_type; unsigned long m_checksum; char m_bytes[100]; } buffer; so that after an 'getmsg(&buffer)' they can access the individual parts 'by name', eg: buffer.m_length, buffer.m_flags, .... ... and as the previous posters pointed out, they eventually get trapped by the 'holes' inserted into the struct by the compiler for the sake of efficiency. My advice in this situation is, to change this code as follows: char buffer[ 2 /* length of message */ + 1 /* several flag bits + 1 /* type of message */ + 4 /* checksum */ + 100 /* arbitrary message */ ]; #define m_length(b) (*((short *) (char *)(b) + 0)) #define m_flags(b) (*((unsigned char *)(char *)(b) + 2)) #define m_type(b) (*((char *) (char *)(b) + 3)) #define m_checksum(b) (*((unsigned long *)(char *)(b) + 4)) #define m_bytes(b) ( (char *)(b) + 8 ) (I inserted some white space for readability.) The least you must know of your compiler in that case is that a 'char' occupies exactly one byte in an 'array of char'. But as before, you can access the individual parts 'by name' as follows: m_length(buffer), m_flags(buffer), .... If 'getmsg' is allways supplied to the same buffer, you could make it even simpler by avoiding a parametrized macros and use #define m_length (*(short *)buffer) #define m_flags (*(unsigned char *)(buffer + 2)) ...... Note that the above expressions are also 'lvalues' ie you can use them on the left side of an assignment. There remains only the minor problem, that 'buffer' must be properly aligned. (Techniques for achieving this are shown in K&R - you simply have to define buffer as a union with the type of desired alignement. Alternatively you may allocate the buffer with 'malloc'.) If your concern is only 'reading' the elements out of the buffer, you have the additional benefit that you can transparently compensate for possible 'byte-order' problems. Suppose the message is produced by some piece of hardware that assumes the LSB of a 16 Bit Integer on the lower adress, and you want to move this hardware to a system, where the CPU takes just the opposite view. All you have to change is: #define m_length ((short)\ ((*(unsigned char *)(buffer+1))<<8)\ |(*(unsigned char *)buffer)) ....... (Hope I missed no brackets ... :-)) Now back to an alternate view of the C-struct-s, hit 'n' if you are no more interested. IMHO many features of the C language can elegantly be explained in an easy way, if you 'translate' the feature to the 'machine level'. (Eg I explain much about pointers and arrays to my classes by sketching pictures with the contents of the data segment.) One thing to misunderstand here is, that such an explanation often describes only *one* possible approach to implement the abstract concept: Though it seems natural, to think about a C-struct as beeing a collection of individual variables located at increasing memory adresses in the order they are declared(%) as struct-components, it often makes more sense, to see a C-struct only as a collection of data-items, that are garanteed *not* to overlap(%%). Furthermore the compiler asserts that access to a named struct-component will allways refer to the same part of memory, even if only the struct-s adress is the same (important when transfering struct-pointers as function parameters). The other guaranty, that the struct-components are located (more or less) adjacent in memory is only of some 'practical' value, especially if you have an 'array of struct'-s or write one struct to a file (using write/fwrite together with sizeof), but has nothing to do with the abstract concept of a C-struct. (%): Even the guarantee, that the struct elements are at ascending adresses in the order they are declared, IMHO only was given to avoid complex (and hard to understand) rules, when and when not it would be allowed to rearrange the elements. Readers who know other good reasons why this guarantee is given are welcome to correct me (hello Chris :-)). (%%): Note, that in the case of a C-union the garanty is *not* that the elements overlap: They only *may* overlap (unless they are of the same type or they are different C-structs but with components of the same type at the beginning, which leads back to the problem when and when not rearranging could have been allowed ... again, correct me if I'm wrong). -- Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83 -- Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83