Path: utzoo!utgpu!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!rutgers!gatech!hubcap!ncrcae!ncrlnk!uunet!mcvax!ukc!dcl-cs!aber-cs!pcg
From: pcg@aber-cs.UUCP (Piercarlo Grandi)
Newsgroups: comp.lang.c
Subject: Re: const, volatile, etc [was Re: #
Summary: But on other machines the optimizer has little effect
Message-ID: <475@aber-cs.UUCP>
Date: 2 Jan 89 21:32:32 GMT
References: <715@auspex.UUCP>     <225800102@uxe.cso.uiuc.edu>
Reply-To: pcg@cs.aber.ac.uk
Distribution: eunet,world
Organization: UCW,Aberystwyth,WALES,UK
Lines: 180

[This was meant to be posted a few days ago, but failed...]

In article <225800102@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:

#       To put it bluntly, that's rubbish.  There are plenty of people out there
#       who can, I suspect, testify that yes, indeed, optimizing C compilers
#       *can* produce better code than non-optimizing ones, *even when the C
#       source is, in some sense, "good" code.*

#   I can and I will so testify. (I sent the exact timings to the persons
#   involved by mail, but maybe it really is generally interesting.)

It is interesting, even if I see the same things differently.

    I tried compiling the original anti-optimizer flamer's matrix
					          ^^^^^^
*Me* being a flamer? surely you joke... :->  Let me insist that while
I am highly opinionated, and love a tough debate, I tend to support my
opinions with reasonable arguments, and I have admitted my errors when
they were reasonably argued. Flaming (as I understand the word)
denotes a kind of more purely emotional type of expression, and
somebody else than me has been indulging in this.

Also, "anti-optimizer" is too coarse and general a label for my argument. It
was against like *aggressive optimizers* for *C*. Hang on...

#   code with optimizations turned off. It took about 24 seconds to
#   run.  Putting in his pointer optimizations reduced the time time to
#   16 seconds, as did running the optimizer on the original. BUT,
#   running the optimizer on the pointerized code, the executable took
#   8 seconds.

I did try with g++ after receiving the results that Doug McDonald had the
patience to obtain and send me. What I noticed was that my pointers code took
the same time with or without optimization, and that was almost twice as good
as the matrix case optimized (note that I used 1.27 that does not yet have
strength reduction).

My opinion is that McDonald's code was generating pretty rough code without
optimizer, and the optimizer simply made it more reasonable.

There is good tradition (did it start with C?) of building compilers with a
poor but simple and *easy to debug* code generator, and then to tack on
its end an optional code improver/simplifier. This module has been
traditionally called an *optimizer*, and I have no argument with it, except
that in a sense it is misnomed.

Let me state for the nth time that I am not against *optimizers*, I don't
advocate sloppy code generators. I am against *aggressive* optimizers. I
don't think they are worth the effort, the cost, the reliability problems,
the results. I have said that an *optimizer* is something that does a
competent job of code generation, by exploiting *language* and *machine*
dependent opportunities (I even ventured to suggest some classification of
good/bad optimizations).

Aggressive optimization to me is what attempts to exploit aspects of the
*program* or *algorithm*. This I do not like because it requires the
optimizer to "understand" in some sense the program, and I reckon that a
programmers should do that, and an optimizer can only "understand" static
aspects of a program and the algorithm embodied in it, and experience
suggests that KISS applies also to compilers, i.e. that the more intelligent
an optimizer is the buggier it is likely to be and the harder the bugs.

As it was said

#	optimizing C compilers *can* produce better code than non-optimizing
#	ones, *even when the C source is, in some sense, "good" code.*

It is a platitude that *optimization* improves all kinds of code, if code
generation is done cursorily; moreover I don't dispute that *aggressive*
optimization may improve somewhat even a well written code; my contention was
that the improvement is, if any, usually small, the cost is large. This
price/performance ratio goes against the grain of a language like C which was
designed under the philosophy that _simpler_ is better than _a bit faster_,
and to this philosophy owes much much of its success. My argument with
volatile in favour of register is based on this philosophy and on assessing
the cost/benefit ratio, not on a claim that the benefit is zero.

Note also that I am not entirely against the cost effectiveness of aggressive
optimization in general either; in languages of much higher level than C, and
designed for it (such as less procedural ones, e.g. SQL/Quel), an optimizer
may try to "understand" the program/algorithm, because it is meant to, or
because the language has been designed to, or because there is no alternative
as to efficiency.

#   Looking at the assembler output showed that the optimizer was
#   working on the floating point part of the code, while the gruesome
#   pointer stuff did the addressing faster.

While I do not agree with the "gruesome" adjective ;-}, I again interpret
differently your observations. In your latest, interesting, message, it is my
understanding that you observe that since the 386 does use a floating point
coprocessor with a stack architecture, usage of register cannot be much
meaningful, so it is the compiler that has to do the grunt work. You also
observe that given the peculiar structure  and small number of 386 registers
(most of them specialized in some way or another) the code generator must
take into account some constrainsts that can conflict with register.

To this the following two obervations apply:

[1] the intel architecture is notoriously braindamaged. C was not meant to
run equally well on every and any machine (admittedly this is a weaker
argument than I would like it to be).

[2] what the compiler is doing is not *aggressive* optimization; it is merely
doing a competent job of code generation for the *machine* at hand. I cannot
imagine an aggressive optimizer having much success with a x86/x87
combination, where caching a register to a variable across expressions is
hard to do because there are so few registers and most are specialized, and
where the floating point unit hasn't really got registers, being essentially
stack based.

#   P.S. The posted pointer code looks, and is, sickening.

It is, it is. But let me make excuses for it:

[1] the straightforward matrix multiply looks sickening to me as well, not in
layout but in structure though, because it is the direct transposition of the
math definition of matrix product, and to me programming is not 1:1 with
mathematics (at least in C).

[2] in the interest of compactness I have omitted all my usual layout
paraphernalia (and even more than that) that I beg to submit would have made
a huge difference in the readability of the example using pointers. I have
posted, in the indentation styles thread in this newgroup, an example of
something close to my usual program layout style.

[3] the example using pointers, yet in its crude form, in my opinion
demonstrates the structure of the algorithm better, in that is makes explicit
that there are at least two major levels of abstraction, scanning the
matrixes for row and columns, and scanning the latter for the inner product
that generates each element of the result.

I may find the time to post, just for fun, how I would have *really* written
both examples.

#   But I have started using similar stuff in my own code.

It has its advantages :->.

#    ( Yes, I stuck in the lines
#
#   #ifdef CRAY
#      puts("The matrix multiply may not vectorize");
#   #endif )

Point well taken ;-} ;-}. Still, at the risk of starting another discussion,
let me say that I do not think that C is a "good for everything" tool. If you
want to vectorize, you may be far better off with another appropriate
language (APL? :->).

Another recently discussed dpANS C feat (euphemism :->), the introduction of
novel (euphemism :->) semantics for [] in parameters, is relevant here, and
has already been discussed elsewhere. I understand very well, as Doug Gwyn
said, that X3J11 is nearly as political a body as a House public works
committee (:->), but I venture to suggest (euphemism :->) that maybe it
should not have tried to please so many constituencies (have you noted by
chance that the size of Harbison&Steele, that try to be really thorough, has
nearly doubled ? Does not dpANS C have the taste of an omnibus appropriations
bill?  :->).

A final note: I am in the process of switching machines. I have a backlog of
mail to reply to and postings to discuss (not least Doug Gwyn's "sleazy
ploys" one, that cost me some sleep :-<). I have a backlog of work to do. The
irrational early reactions to my postings have consumed a lot of my
time/stamina.

However since now that this debate is making some progress (e.g. I have
learned something from McDonald's disagreement, even if I have not been
convinced) I promise/threaten (;-}) to try to devote some time to it still, for
the sake of argument. I have been happy that somebody, whether agreeing or
disagreeing with me, has started to address the issues. It is a good hope
that net bandwidth will be used less (because I will not have to defend
myself against insults) and better.

Happy New Year!
-- 
Piercarlo "Peter" Grandi			INET: pcg@cs.aber.ac.uk
Sw.Eng. Group, Dept. of Computer Science	UUCP: ...!mcvax!ukc!aber-cs!pcg
UCW, Penglais, Aberystwyth, WALES SY23 3BZ (UK)