Path: utzoo!mnetor!uunet!sdrc!scjones From: scjones@sdrc.UUCP (Larry Jones) Newsgroups: comp.lang.c Subject: Re: Noalias trivia question Message-ID: <270@sdrc.UUCP> Date: 30 Apr 88 16:44:49 GMT References: <56@lakart.UUCP> <1164@maccs.UUCP> Organization: Structural Dynamics Research Corp., Cincinnati Lines: 157 Summary: A not-so brief history of noalias and optimization In article <1164@maccs.UUCP>, gordan@maccs.UUCP (gordan) writes: > In all the time noalias was discussed on comp.lang.c, just about no one > spoke up in favor of it. In the recent X3J11 meeting it was reportedly > voted down by well over the required (2/3 ?) majority. > > And yet, it must have once gotten a similar majority in favor, not so > very long ago (January?). December. > So, like, who on X3J11 will admit to having voted for (or worse, > proposed) noalias in the first place? Why did it seem like a good idea > at the time, and what were the reasons for dropping it (apart from > universal opprobrium). Enquiring minds &c. Well, being one of those fools who rush in where angels fear to tread, I will admit to having voted for noalias (before you start sending letter bombs, I also voted to remove it). How did noalias come to be? One of the committee's goals has been to define a language that will be broadly usefull. This has lead to all kinds of conflicting demands that have to be reconciled somehow. For instance, people using C for general programming want the compilers to optimize the heck out of everything so their programs run as fast as possible. People writing system software like memory mapped device drivers want no optimization at all. These conflicting demands eventually lead to the "volatile" keyword which turned out to be a double win -- the general types get the optimization they want and the system types get the control they want plus they get all of the optimizations for the parts of the code that aren't accessing hardware. One of the issues relating to optimization that the committee has wrestled with for a long time is the problem of aliasing. When you're trying to optimize generated code, it is, of course, necessary to make sure that the resulting code still gets the "right answer". To do that, you need to be very careful to keep track of exactly how and where the value of an expression can be changed. This is particularly important when you have some type of parallel processing since having multiple CPUs makes the benefits of changing the order in which things occur much greater since many things can be going on at the same time. Unfortunately, keeping track of what's going on in a C program is next to impossible. Most of the problems are caused by pointers -- although some pointers point are the only way to get to the object they point to, most pointers are aliases for objects that can be accessed in other ways as well. By doing global flow analysis of the entire program it is (theoretically) possible to figure out all the things that a pointer can possibly point to but this is extremely difficult. It is impossible given that C allows separate compilation since this means that the compiler can't look at the whole program. Traditionally, this has been dealt with by assuming that anything that has had its address taken or could have had its address taken (e.g. globals) is changed any time something is changed through a pointer. Similarly, changing any value through a pointer is assumed to change the values pointed to by all pointers. Although these pessimistic assumptions are quite safe, they prevent most optimizations. For instance, most users of vector processors are surprised that the following function: void addvec(double a[], double b[], double c[], int n) { int i; for (i = 0; i < n; i++) a[i] = b[i] + c[i]; } does not result in a single vector addition (especially since writing the "same" function in FORTRAN does). Actually, "angry" is probably a better term that "surprised" in this case. The reason for this is, of course, that in C there is nothing wrong with calling this function like: addvec(a+1, a, a+1, sizeof a / sizeof a[0] - 1); which sums the vector into its last element. This is invalid in FORTRAN which is why the FORTRAN version can vectorize but the C version can't. This turns out to be a major issue for every vendor with parallel hardware. Lest you think that this is a fairly esoteric group, just remember that every IBM PC with a math coprocessor is a parallel machine (although existing software doesn't make much use of the capability yet). Also note that this is not a specsmanship battle either -- users buy parallel machines to get performance and they get very hostile when they don't get it. Minor details like their language of choice not being suited to vectorization are not of interest. This is the atmosphere that existed at the December meeting. Many of the people with compilers for parallel machines were saying that the lack of a method for specifying aliasing was sufficient reason for a "no" vote on the standard and proposed a number of possible solutions. One was to change the meaning of parameters declared as arrays to be pointers to things that don't overlap each other. This was accepted fairly well except that there was concern that it might break existing code which uses array and pointer notation interchangably. A possible solution to this objection was to only assign the new meaning in prototypes, but that introduced an incompatibility between declarations in and out of prototypes that many people objected to. Other interesting variations on syntax were introduced and rejected for various reasons. Also, it was pointed out that this only solved the problem with function arguments, it did not solve the general problem. None of the proposals got the necessary 2/3 vote for adoption, but a clear majority of the committee was interested in finding a solution to the problem. Finally, the people who were most interested in the issue agreed to form a working group to try to concoct a solution which would be acceptable to the majority of the committee. On Friday, the working group presented the solution they had arrived at -- "noalias". What was presented was only the concept, the exact wording still had to be worked out. It seemed to be everything anyone could want: the default was that everything was aliased so no existing code would be broken, you could specify it on parameters to get the non-overlapping parameters (which all the library functions except memmove required - now we could specify the restriction in the prototypes instead of having a disclaimer in English!), and you could use it on pointers and objects just like "const" and "volatile" to solve the general problem. What more could anyone want? Well, there were a number of concerns, of course. We hadn't much time to look at it and think about the possible consequences. We were trying to get the draft out for the second (and last) public review. We didn't have the exact wording. Interestingly, noone brought up the problem that finally caused its downfall - it was too hard for mere mortals to get right; particularly in large programs and during maintenance. Guess we were all wearing our wizard hats when we should have been wearing our manager hats. Finally, since we had all agreed that we wanted to solve the problem and this looked like a good solution, we agreed to add it to the draft. If it really was a good solution (and it seemed like it was) then it would have been reviewed and, even if the words weren't quite right, we could just change them editorially without having to have another public review. Of course, having all those other people review it rather that just us would point out any problems we may have missed (did I just hear some ominous music in the background?). After that, you know what happened. It got mentioned here and in other forums as well. The reactionaries immediately flamed it, the rational expressed concern but reserved judgement pending the exact wording. The exact wording came and was judged incomprehensible. The rational, having carefully considered all the ramifications, denounced it. Dennis Ritchie, father of C, denounced it. Henry Spencer put Dennins's denouncement into his signature. Many people made quite persuasive, rational, arguments that it was not "a good thing". April came and the committee, burnt, blackened, covered with soot and smelling of smoke, reconvened. Although only a very few of the committee members are foolish enough to post things here and expose themselves to the direct ire of the net, many of them do read it. The rational arguments won. Although Dennis showed up in person for the first time in the history of the committee, most of the members had already made up their minds and noalias was summarily removed. There was a brief flurry of activity to try to fall back to one of the other proposals (most notably the array arguments may not overlap proposal), the committee was so gun-shy that even an attempt to simply declare array args an obsolescent feature so it could be changed in a future standard was defeated. And that's the way it was (at least that's the way I remember it, which may or may not have much to do with reality). ---- Larry Jones UUCP: ...!sdrc!scjones SDRC AT&T: (513) 576-2070 2000 Eastman Dr. BIX: ltl Milford, OH 45150 "When all else fails, read the directions."