Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!cs.utexas.edu!sun-barr!newstop!exodus!rbbb.Eng.Sun.COM!chased From: chased@rbbb.Eng.Sun.COM (David Chase) Newsgroups: comp.lang.modula3 Subject: Re: optimization and garbage collection Message-ID: <5293@exodus.Eng.Sun.COM> Date: 4 Jan 91 23:19:14 GMT References: <91Jan2.131022pst.3117@arcturia.parc.xerox.com> <5142@exodus.Eng.Sun.COM> <39976@super.ORG> Sender: news@exodus.Eng.Sun.COM Organization: Sun Microsystems, Mt. View, Ca. Lines: 80 rminnich@super.org (Ronald G Minnich) writes: >David Chase wrote:: >|> Please note that this is very probably not a bug in the optimizer; >|> such behavior was predicted over 3 years ago. >Any objection a more detailed explanation? Nope (and this is the second request, counting e-mail queries, so....) What happens (in broad terms) is this: The C compiler is not required to maintain the variables, as you write them, with the values that you expect to see in them. The variables may be folded into common/loop-invariant/loop-inductive expressions (because it makes the code go faster) and the registers containing the original variables (now dead) are reused for temporaries. These temporaries may contain positive or negative offsets from pointers, or even the difference of two pointers. The garbage collector goes looking for pointers to the beginning of an object, finds none, and reclaims the object (incorrectly). Any temporaries referencing the object are now dangling pointers. Note well that this is correct, standard-conforming, behavior on the part of the (C) optimizer. The garbage collector is relying on artifacts of a particular implementation of C. Those artifacts are not required to exist, were not intended by the compiler writers when they wrote the compiler, and are not checked by a single line of test code. This problem exists whenever you use C as an intermediate language, optimize it, and make use of a conservative stack/register-scanning garbage collector; it is not peculiar to Modula-3. For example, suppose you wrote your own copy of strcpy: char * strcpy (char * s1, char * s2) { int i; for (i = 0; s2[i] != 0; i++) s1[i] = s2[i]; s1[i] = 0; return s1; } This might reasonably be optimized to char * strcpy (char * s1, char * s2) { char * t = s1; while (*s2 != 0) { *s1 = *s2; s1++; s2++; } *s1 = 0; return t; } No memory allocation takes place in the inner loop, so it appears to be ok to toss the pointers to the beginning of the strings. However, the same optimization applies if a procedure is called within the loop, as in: char * tweaked_strcpy (char * s1, char * s2, char (*f)(char)) { char * t = s1; while (*s2 != 0) { *s1 = (*f)(*s2); s1++; s2++; } *s1 = 0; return t; } The procedure f can do whatever it chooses, including allocate memory. And, with preemptively scheduled threads, a garbage collection can occur at any time. David Chase Sun Microsystems