Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!crdgw1!uunet!mcsun!hp4nl!star.cs.vu.nl!kjb From: kjb@cs.vu.nl (Kees J. Bot) Newsgroups: comp.os.minix Subject: Re: #! in MM -- take 2 Message-ID: <10105@star.cs.vu.nl> Date: 28 May 91 12:48:33 GMT References: <10033@star.cs.vu.nl> <10055@star.cs.vu.nl> <1991May24.164952.22295@Arco.COM> <819@philica.ica.philips.nl> Sender: news@cs.vu.nl Lines: 60 adrie@philica.ica.philips.nl (Adrie Koolen) writes: >I compiled a program containing: > char s1[] = ""; > char s2[] = "a"; > char s3[] = "ab"; > char s4[] = "abc"; > char s5[] = "abcd"; > char s6[] = "abcde"; > ... > printf("%08x, %08x, %08x, %08x, %08x, %08x.\n", s1, s2, s3, s4, s5, s6); >on a SparcStation. The Sun C compiler aligned all strings to word addresses: > 000040a8, 000040ac, 000040b0, 000040b4, 000040b8, 000040c0. This is not exactly true, if you change the declarations to be like: char *s3 = "ab"; Then you will find that the strings now have an alignment of 1. The Sun compiler has the habit of giving all global objects an alignment of 4 in the "data" segment. The unnamed strings are put in the "data1" segment with an alignment of 1. The string "ab" in the s3[] declaration above is seen by the compiler as an { 'a', 'b', '\0' } initializer and not as a string that goes in the data1 segment. The gcc compiler is a bit smarter by noticing that the s3 array doesn't need an alignment of 4. Compile the program with 'cc -S', 'gcc -S', and 'gcc -S -fwritable-strings' for both the [] and the * versions and study the '.s' output. For people who do not understand the alignment business yet, I will try to explain... Most machines these days look at their memory as an array of words at the hardware level. All accesses to memory are word accesses. If one wants to read a word at address 32, then the processor puts memory address 8 on the address bus (assuming 4 byte words). Reading a word from address 33 will lead to a bus error if you are lucky, or the processor will fetch both words 8 and 9 from memory to find the proper bytes. Needless to say, writing a misaligned word is even more expensive. Reading halfwords at an even address or reading a byte at *any* address makes the processor read the word that contains that halfword or byte. To write a byte the processor may need to read a word first before writing it back with the new byte in it. I think the old PDP-11 did it this way, but I think most modern processors have a way of specifying which bytes in a word need to be written. (Pins on a CPU used to be expensive, maybe they still are.) (Things are always different in reality. The Sun 4/330's here at the VU like their memory best if served as 8 times 8 bit wide SIMM-modules, which seems to indicate that they do memory transfers in 64 bit chunks.) One note on the ANSI C standard: Our local C guru informed me that the only thing the standard says about alignment is that malloc returns an address that is suitably aligned for any datatype, and nothing more. -- Kees J. Bot (kjb@cs.vu.nl) Systems Programmer, Vrije Universiteit Amsterdam