Path: utzoo!utgpu!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!rutgers!gatech!hubcap!ncrcae!ncrlnk!uunet!mcvax!ukc!dcl-cs!aber-cs!pcg From: pcg@aber-cs.UUCP (Piercarlo Grandi) Newsgroups: comp.lang.c Subject: Re: const, volatile, etc [was Re: # Summary: But on other machines the optimizer has little effect Message-ID: <475@aber-cs.UUCP> Date: 2 Jan 89 21:32:32 GMT References: <715@auspex.UUCP> <225800102@uxe.cso.uiuc.edu> Reply-To: pcg@cs.aber.ac.uk Distribution: eunet,world Organization: UCW,Aberystwyth,WALES,UK Lines: 180 [This was meant to be posted a few days ago, but failed...] In article <225800102@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes: # To put it bluntly, that's rubbish. There are plenty of people out there # who can, I suspect, testify that yes, indeed, optimizing C compilers # *can* produce better code than non-optimizing ones, *even when the C # source is, in some sense, "good" code.* # I can and I will so testify. (I sent the exact timings to the persons # involved by mail, but maybe it really is generally interesting.) It is interesting, even if I see the same things differently. I tried compiling the original anti-optimizer flamer's matrix ^^^^^^ *Me* being a flamer? surely you joke... :-> Let me insist that while I am highly opinionated, and love a tough debate, I tend to support my opinions with reasonable arguments, and I have admitted my errors when they were reasonably argued. Flaming (as I understand the word) denotes a kind of more purely emotional type of expression, and somebody else than me has been indulging in this. Also, "anti-optimizer" is too coarse and general a label for my argument. It was against like *aggressive optimizers* for *C*. Hang on... # code with optimizations turned off. It took about 24 seconds to # run. Putting in his pointer optimizations reduced the time time to # 16 seconds, as did running the optimizer on the original. BUT, # running the optimizer on the pointerized code, the executable took # 8 seconds. I did try with g++ after receiving the results that Doug McDonald had the patience to obtain and send me. What I noticed was that my pointers code took the same time with or without optimization, and that was almost twice as good as the matrix case optimized (note that I used 1.27 that does not yet have strength reduction). My opinion is that McDonald's code was generating pretty rough code without optimizer, and the optimizer simply made it more reasonable. There is good tradition (did it start with C?) of building compilers with a poor but simple and *easy to debug* code generator, and then to tack on its end an optional code improver/simplifier. This module has been traditionally called an *optimizer*, and I have no argument with it, except that in a sense it is misnomed. Let me state for the nth time that I am not against *optimizers*, I don't advocate sloppy code generators. I am against *aggressive* optimizers. I don't think they are worth the effort, the cost, the reliability problems, the results. I have said that an *optimizer* is something that does a competent job of code generation, by exploiting *language* and *machine* dependent opportunities (I even ventured to suggest some classification of good/bad optimizations). Aggressive optimization to me is what attempts to exploit aspects of the *program* or *algorithm*. This I do not like because it requires the optimizer to "understand" in some sense the program, and I reckon that a programmers should do that, and an optimizer can only "understand" static aspects of a program and the algorithm embodied in it, and experience suggests that KISS applies also to compilers, i.e. that the more intelligent an optimizer is the buggier it is likely to be and the harder the bugs. As it was said # optimizing C compilers *can* produce better code than non-optimizing # ones, *even when the C source is, in some sense, "good" code.* It is a platitude that *optimization* improves all kinds of code, if code generation is done cursorily; moreover I don't dispute that *aggressive* optimization may improve somewhat even a well written code; my contention was that the improvement is, if any, usually small, the cost is large. This price/performance ratio goes against the grain of a language like C which was designed under the philosophy that _simpler_ is better than _a bit faster_, and to this philosophy owes much much of its success. My argument with volatile in favour of register is based on this philosophy and on assessing the cost/benefit ratio, not on a claim that the benefit is zero. Note also that I am not entirely against the cost effectiveness of aggressive optimization in general either; in languages of much higher level than C, and designed for it (such as less procedural ones, e.g. SQL/Quel), an optimizer may try to "understand" the program/algorithm, because it is meant to, or because the language has been designed to, or because there is no alternative as to efficiency. # Looking at the assembler output showed that the optimizer was # working on the floating point part of the code, while the gruesome # pointer stuff did the addressing faster. While I do not agree with the "gruesome" adjective ;-}, I again interpret differently your observations. In your latest, interesting, message, it is my understanding that you observe that since the 386 does use a floating point coprocessor with a stack architecture, usage of register cannot be much meaningful, so it is the compiler that has to do the grunt work. You also observe that given the peculiar structure and small number of 386 registers (most of them specialized in some way or another) the code generator must take into account some constrainsts that can conflict with register. To this the following two obervations apply: [1] the intel architecture is notoriously braindamaged. C was not meant to run equally well on every and any machine (admittedly this is a weaker argument than I would like it to be). [2] what the compiler is doing is not *aggressive* optimization; it is merely doing a competent job of code generation for the *machine* at hand. I cannot imagine an aggressive optimizer having much success with a x86/x87 combination, where caching a register to a variable across expressions is hard to do because there are so few registers and most are specialized, and where the floating point unit hasn't really got registers, being essentially stack based. # P.S. The posted pointer code looks, and is, sickening. It is, it is. But let me make excuses for it: [1] the straightforward matrix multiply looks sickening to me as well, not in layout but in structure though, because it is the direct transposition of the math definition of matrix product, and to me programming is not 1:1 with mathematics (at least in C). [2] in the interest of compactness I have omitted all my usual layout paraphernalia (and even more than that) that I beg to submit would have made a huge difference in the readability of the example using pointers. I have posted, in the indentation styles thread in this newgroup, an example of something close to my usual program layout style. [3] the example using pointers, yet in its crude form, in my opinion demonstrates the structure of the algorithm better, in that is makes explicit that there are at least two major levels of abstraction, scanning the matrixes for row and columns, and scanning the latter for the inner product that generates each element of the result. I may find the time to post, just for fun, how I would have *really* written both examples. # But I have started using similar stuff in my own code. It has its advantages :->. # ( Yes, I stuck in the lines # # #ifdef CRAY # puts("The matrix multiply may not vectorize"); # #endif ) Point well taken ;-} ;-}. Still, at the risk of starting another discussion, let me say that I do not think that C is a "good for everything" tool. If you want to vectorize, you may be far better off with another appropriate language (APL? :->). Another recently discussed dpANS C feat (euphemism :->), the introduction of novel (euphemism :->) semantics for [] in parameters, is relevant here, and has already been discussed elsewhere. I understand very well, as Doug Gwyn said, that X3J11 is nearly as political a body as a House public works committee (:->), but I venture to suggest (euphemism :->) that maybe it should not have tried to please so many constituencies (have you noted by chance that the size of Harbison&Steele, that try to be really thorough, has nearly doubled ? Does not dpANS C have the taste of an omnibus appropriations bill? :->). A final note: I am in the process of switching machines. I have a backlog of mail to reply to and postings to discuss (not least Doug Gwyn's "sleazy ploys" one, that cost me some sleep :-<). I have a backlog of work to do. The irrational early reactions to my postings have consumed a lot of my time/stamina. However since now that this debate is making some progress (e.g. I have learned something from McDonald's disagreement, even if I have not been convinced) I promise/threaten (;-}) to try to devote some time to it still, for the sake of argument. I have been happy that somebody, whether agreeing or disagreeing with me, has started to address the issues. It is a good hope that net bandwidth will be used less (because I will not have to defend myself against insults) and better. Happy New Year! -- Piercarlo "Peter" Grandi INET: pcg@cs.aber.ac.uk Sw.Eng. Group, Dept. of Computer Science UUCP: ...!mcvax!ukc!aber-cs!pcg UCW, Penglais, Aberystwyth, WALES SY23 3BZ (UK)