Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!crdgw1!uunet!mcsun!hp4nl!star.cs.vu.nl!kjb
From: kjb@cs.vu.nl (Kees J. Bot)
Newsgroups: comp.os.minix
Subject: Re: #! in MM -- take 2
Message-ID: <10105@star.cs.vu.nl>
Date: 28 May 91 12:48:33 GMT
References: <klamer.674467423@mi.eltn.utwente.nl> <10033@star.cs.vu.nl> <klamer.675068096@mi.eltn.utwente.nl> <10055@star.cs.vu.nl> <1991May24.164952.22295@Arco.COM> <819@philica.ica.philips.nl>
Sender: news@cs.vu.nl
Lines: 60

adrie@philica.ica.philips.nl (Adrie Koolen) writes:

>I compiled a program containing:

>	char s1[] = "";
>	char s2[] = "a";
>	char s3[] = "ab";
>	char s4[] = "abc";
>	char s5[] = "abcd";
>	char s6[] = "abcde";
>	...
>	printf("%08x, %08x, %08x, %08x, %08x, %08x.\n", s1, s2, s3, s4, s5, s6);

>on a SparcStation. The Sun C compiler aligned all strings to word addresses:

>	000040a8, 000040ac, 000040b0, 000040b4, 000040b8, 000040c0.

This is not exactly true, if you change the declarations to be like:

	char *s3 = "ab";

Then you will find that the strings now have an alignment of 1.  The Sun
compiler has the habit of giving all global objects an alignment of 4 in
the "data" segment.  The unnamed strings are put in the "data1" segment
with an alignment of 1.  The string "ab" in the s3[] declaration above is
seen by the compiler as an { 'a', 'b', '\0' } initializer and not as a
string that goes in the data1 segment.
The gcc compiler is a bit smarter by noticing that the s3 array doesn't
need an alignment of 4.

Compile the program with 'cc -S', 'gcc -S', and 'gcc -S -fwritable-strings'
for both the [] and the * versions and study the '.s' output.

For people who do not understand the alignment business yet, I will
try to explain...

Most machines these days look at their memory as an array of words at the
hardware level.  All accesses to memory are word accesses.  If one wants
to read a word at address 32, then the processor puts memory address 8
on the address bus (assuming 4 byte words).  Reading a word from address
33 will lead to a bus error if you are lucky, or the processor will fetch
both words 8 and 9 from memory to find the proper bytes.  Needless to say,
writing a misaligned word is even more expensive.  Reading halfwords at
an even address or reading a byte at *any* address makes the processor read
the word that contains that halfword or byte.  To write a byte the processor
may need to read a word first before writing it back with the new byte
in it.  I think the old PDP-11 did it this way, but I think most modern
processors have a way of specifying which bytes in a word need to be
written.  (Pins on a CPU used to be expensive, maybe they still are.)

(Things are always different in reality.  The Sun 4/330's here at the VU
like their memory best if served as 8 times 8 bit wide SIMM-modules, which
seems to indicate that they do memory transfers in 64 bit chunks.)

One note on the ANSI C standard:  Our local C guru informed me that the
only thing the standard says about alignment is that malloc returns an
address that is suitably aligned for any datatype, and nothing more.
--
	                        Kees J. Bot  (kjb@cs.vu.nl)
	              Systems Programmer, Vrije Universiteit Amsterdam