Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!brl-adm!brl-smoke!gwyn From: gwyn@brl-smoke.ARPA (Doug Gwyn ) Newsgroups: comp.lang.c Subject: Re: Is an object made up of bytes? Message-ID: <5527@brl-smoke.ARPA> Date: Fri, 16-Jan-87 20:50:39 EST Article-I.D.: brl-smok.5527 Posted: Fri Jan 16 20:50:39 1987 Date-Received: Mon, 19-Jan-87 23:44:05 EST References: <2144@brl-adm.ARPA> <5497@brl-smoke.ARPA> <1987Jan15.215225.9688@sq.uucp> Reply-To: gwyn@brl.arpa (Doug Gwyn (VLD/VMB) ) Organization: Ballistic Research Lab (BRL), APG, MD. Lines: 54 In article <1987Jan15.215225.9688@sq.uucp> msb@sq.UUCP (Mark Brader) writes: >Richard Stallman says [in effect]: >> I am not sure whether the standard implies that, given "short in, out;", >> { char *inptr, *outptr; int i; >> inptr = (char *) ∈ outptr = (char *) &out; >> for (i = 0; i < sizeof (short); i++) outptr[i] = inptr[i]; } >> is defined and equivalent to "out = in;". >and Doug Gwyn replies: >$ No, this can't be guaranteed. For example, there may be bits >$ in the short that are not covered by its chars. >I'm pretty sure this is wrong. The draft proposed standard says: Mark is, I think, correct in his assessment of the nature of bytes in the X3J11 model of C objects. However, I had something else in mind but due to interruptions while preparing my response I didn't get it worded correctly. (The extra bits I had in mind were tag bits; see below for a corrected version.) I'll try again.. The things that prevent RMS's approach from working portably are: The semantics of "(char *) &object" aren't guaranteed to produce anything that can be safely dereferenced to access a char. The only guarantee is that the opposite conversion can be made subsequently without losing information. This can be an issue for machines that don't support byte addressing; to keep pointer arithmetic simple, the high-order bits of a pointer may indicate the size of its dereferenced type; in such a case, if the cast is merely a word transfer without the bits being shifted and otherwise rearranged, the cast (char *) does not produce a useful address. Even if the resulting char pointer designates a char, it might not be the char that one would guess. On "little endian" machines it probably would be, but there may be "big endian" byte-addressed architectures where the numeric address of a word is not the lowest-valued address of the bytes within the word; in this case the loop in the example would copy the wrong collection of bytes (assuming again that the cast is implemented as a simple word transfer without being rearranged specifically to make such examples work, which would involve additional overhead). In a tagged architecture, the pointed-at object may not be referenced as the wrong type without causing a machine trap. In general, I believe X3J11 intended to strongly discourage ANY reliance on "type punning". P.S. Upon re-reading 3.3.4 Semantics, I see that RMS and I interpreted the use of the word "may" differently. Comparison with other sections of the document now leads me to believe that RMS was probably correct in thinking that pointer<->integer conversion via casts MUST be supported by a conforming implementation, although enough is left "implementation-defined" that an implementation could choose to make this a useless operation. This means that some restriction on use of externs in initializers really is necessary (to prevent having to support complete C-arithmetic in linkers) if the typical implementation is to give useful meaning to such conversions. This deficiency in the draft standard needs to be fixed.