Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!tut.cis.ohio-state.edu!zaphod.mps.ohio-state.edu!think!yale!cmcl2!lanl!lambda!jlg
From: jlg@lambda.UUCP (Jim Giles)
Newsgroups: comp.lang.misc
Subject: 'register' variables and other goodies (was Re: Common subexpression optimization)
Message-ID: <14226@lambda.UUCP>
Date: 6 Feb 90 01:00:40 GMT
References: <PCG.90Feb5163252@rupert.cs.aber.ac.uk>
Lines: 184

From article <PCG.90Feb5163252@rupert.cs.aber.ac.uk>, by pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi):
> [... 'register' is only a 'noalias' substitute - I said ...]
>
> Well, there is also actually a strong *hint* that the variable will
> be heavily *dynamically* used across statement boundaries, in the
> current scope. The compiler can use this hint very effectively.

Such a hint is only useful to an exceptionally dumb compiler.  Most
modern compilers do better data flow analysis than human programmers
are willing to do. Studies have shown that compilers can consistently
do a _very_ good job of register allocation without such hints.  In fact,
most C compilers simply ignore the 'register' attribute except to verify
that the variable is never used with an 'address-of' (&) operator.

In the overwhelming majority of programming environments, the only way
to beat the compiler's code is to switch to assembly.  The use of
'register' or other such tricks is insignificant (sometimes even
damaging).

> [...]
> A careful programmer will never declare a variable for a scope larger
> than that in which it is used, [...]

I disagree.  A careful programmer will neither write extremely monolithic
code, nor write code which is fragmented into lots of little scopes.  If
I have a 50 line function which uses XYZ in only the middle third, I'm
not likekly to make a new scope just for XYZ.  I _may_ split out the
middle third _if_ it seems to be a distinct and nearly independent
segment (but, then I would ask myself whether the code should actually
all be in the same function at all).

> [...]                          will never reuse a variable with the same
> name for two different roles.

Now, I agree with that.  However, I make one caveat: in most languages
including C, the idiom for the use of index variables in loops is single
letter variables.  Such variables are often used in separate loops (with
non-overlapping scopes of course) without any sign of confusion that I've
ever noticed.  Of course, you _could_ argue that the variable _isn't_ being
used in separate roles - it's always an index variable.

It is interesting that you consider using the same name for two different
things to be very evil and yet you consider having two names for a single
object (aliasing) to be acceptable.  For most people, the degree of evil
is the other way 'round.

> [...]
> 	Of course this is because I am eccentric enough to think
> 	that the performance characterization of a program is
> 	part of its design and pragmatics should be as obvious
> 	as semantics.

One should only degrade the readibility of code with performance enhancing
transformations when absolutely necessary: that is, when the performance
of the code would otherwise be unsatisfactory.  This usually applies only
to a very small part of the code for any given project.  Most of the code
is under another sort of optimization pressure altogether: the pressure
to work correctly, to get written quickly, to be maintainable, to be
easy to enhance when new demands are made on the program, etc..  Even
the code that needs to work _FAST_ should first be written as clearly
as possible and only _then_ optimized.

> [... Other languages don't usually _need_ 'register' ...]
> Let me differ. Fortran has had equivalence and common forever,
> and even if dirty tricks are prohibited in theory, most compilers
> cannot assume users are well behaved [...]

There is a difference between aliasing and storage association.  Common
blocks can cause storage association between different objects - but _NOT_
within the same scope.  The only type of aliasing possible with common
is passing common variables through the argument list.  But this is illegal!
And most compilers _DO_ assume that such aliasing has not occurred.

As for equivalence: that is a _LOCAL_ declaration.  The compiler can clearly
see what variables are aliased and what variables aren't.  There is no need
for a 'register' attribute to declare this information, the compiler is
already explicitly aware of it.

Fortran 90, on the other hand, has introduced pointers which can point
to other (non-dynamic) objects.  In this language, something analogous
to the 'register' attribute is needed.  The method chosen by the
committee was symmetric to the C solution: Fortran 90 has the POINTEE
attribute, which tells the compiler that an object _may_ be aliased.
This means that the default attribute of most Fortran 90 objects is
effectively 'register'.

> [...]          Pascal has 'var' parameters,

Yes, but Pascal is not separately compilable.  The compiler can do a complete
interprocedural dataflow analysis to find out _unambiguously_ what arguments
might be aliased with what global variables.

> [...]                                       and, more murkly, variant
> records without discriminant.

C _also_ has variant records without discriminants!  They're called unions.
The kinds of problems caused by such things are usually type-coercion
problems and have nothing to do with 'register' attributes or aliasing
in the hidden sense.  After all, the declaration of such a union is clearly
visable in any scope that can reference any field in it - so the compiler
can already determine that overlapping fields _might_ be aliased (just like
Fortran's EQUIVALENCE in fact).

> [...]
> the absence of presence of 'noalias' (and 'volatile') does change the
> semantics of a program, while for 'register' this is not true; [...]

This is false.  The 'register' attribute should have no effect on the
_semantics_ of a correct program.  If you don't use the address-of (&)
operator on a variable, the presence or absence of 'register' on that
variable should have NO effect on the semantics of the code.  If it does,
I suspect that you have a broken compiler!

> [...]                                               'register'
> also gives a *positive* hint on usage frequency.

Yes, one that is practically useless on a good compiler.  In fact, if
the compiler makes an effort to put your 'register' vars into registers,
it may actually _inhibit_ optimal register utilization.

The fact of the matter is, usage frequency is one of the things I expect
the language environment to tell ME, not the other way around.  The
compiler and the run-time profiler are where that information comes
from - not from my gut feeling.  Now, if the profiler could feedback
information into the compiler for a subsequent compile, THAT might
be a useful feature.

> [...]
> With 'register' safety (no aliasing) is a side effect, but a
> clever one, under more than one aspect.

Actually, no aliasing is about the _only_ useful feature of the 'register'
attribute.  And it is one that is basically unneeded in other languages.
Even so, it's not all that useful - unless you like to explicitly copy-in
copy-out all your global variables that you wish to use.  And it doesn't
help with array manipulation at all (which are still pointers and are
assumed aliased to everything that's not 'register').

> [...]
> What ruins the alias show for most languages is either separate compilation
> or parameter passing, even where pointers are not present.

It's not either-or.  The problem arises only with _both_ parameter passing
and separate compilation.  Without separate compilation, the compiler can
look at the whole call tree to detect any possible aliasing.  Without call
by reference there is no problem (and this is the only parameter passing
mechanism which causes the problem).

Even so, a run-time test for aliasing would be cheap and easy to implement,
but the current lack of interest indicates to me that there really isn't
much of a problem.  Compilers _DO_ optimize as if such aliasing has not
taken place and they _DON'T_ usually test for aliasing - yet the frequency
of such errors is very small.

> [...]                                                       And even if
> this not true, the compiler still has to guess which of the safe variables
> are actually worth caching.

But, this is something that the compiler is usually _very_ good at.  Better
than most people have time to be.  (And, if you _do_ have time, you're
better off going to assembly where your register declarations aren't hints
but _orders!)  

> [...]
> 	Naturally this is a moot point on many of today's architectures,
> 	where register optimization is entirely unnecessary, as there usually
> 	far more registers available than variables [...]

Additional registers aren't a panacea.  For one thing, as the number of
registers increase, the register allocation schemes have gone 'cosmopolitan'.
There is an increasing effort to keep data in registers across procedure
calls.  The less efficiently codes use the registers, the less efficiently
the code runs - even if there are a massive number of registers.

For another thing, register utilization is not the only thing inhibited
by aliasing!  All those multi-register machines you're talking about are
also likely to be pipelined.  Pipelining is inhibited by possible aliasing
as bad (or worse) than register utilization.  And there's NO source level
control (in C or anywhere else) over code ordering optimizations on the
pipelining level.  The best thing is for the language to be designed in
such a way that aliasing is difficult and rare and is only forced upon
the user when it is actually part of the functionality he needs.

J. Giles