Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!iuvax!cica!tut.cis.ohio-state.edu!gem.mps.ohio-state.edu!rpi!nyser!cmx!dl
From: dl@cmx.npac.syr.edu (Doug Lea)
Newsgroups: comp.lang.c++
Subject: Re: C++ vrs Fortran for linear algebra.
Message-ID: <1822@cmx.npac.syr.edu>
Date: 4 Aug 89 19:18:18 GMT
References: <3967@portia.Stanford.EDU> <508@gill.UUCP> <SCHWARTZ.89Jul30024428@shire.cs.psu.edu>
Reply-To: dl@cmx.npac.syr.edu (Doug Lea)
Organization: Northeast Parallel Architectures Center, Syracuse NY
Lines: 340


There is a thorny problem in attempting to write C++ matrix code
(among similar sorts of applications) that can be compiled, via an
optimizing compiler like g++, to run as fast as well-optimized Fortran
code. 

Instead of using the example of Dave Nichols & Dirk Grunwald,
I'll use a simpler example, that makes this clearer. In fact, the
example can be written using only C constructs, even though it is
definitely a C++-ish problem.

struct imatrix /* a matrix of integers */
{
  int  rows;   /* number of rows */
  int  cols;   /* number of columns */
  int* data;   /* pointer to rows*cols ints from freestore (or wherever) */
};

void fill(struct imatrix* mp, int x)
{
  int i, j;
  for (i = 0; i < mp->rows; ++i)
    for (j = 0; j < mp->cols; ++j)
      *(mp->data + i * mp->cols + j) = x; /* traverse in row-major order */
}
    

[In C++, you'd probably write this as something like

class Imatrix
{
private:
  int  rows;
  int  cols;
  int* data;
public:
       Imatrix(int r, int c) :rows(r), cols(c), data(new int[r * c]) {}
  int& operator() (int i, int j) { return *(data + i * cols + j); }
  void fill(int x)
  {
    for (int i = 0; i < rows; ++i)
      for (int j = 0; j < cols; ++j)
        (*this)(i, j) = x;
  }
};

but nothing I'll say depends on these differences.]

Fill() is an example of a function that cannot be optimized as well in
C or C++ as it could in other languages because the compiler cannot
safely assume that memory accesses through mp->data do not affect
mp->rows or mp->cols. Thus, neither mp->rows nor mp->cols may be
cached in a register, and very few other optimizations are possible.
This is easy to see by comparing the code generated for fill with that
for a similar fill2.

void fill2(int rows, int cols, int* data, int x)
{
  int i, j;
  for (i = 0; i < rows; ++i)
    for (j = 0; j < cols; ++j)
      *(data + i * cols + j) = x;
}

Using g++ (or gcc) (-O -fstrength-reduce), on a 1000 X 1000 imatrix,
the first fill is FIFTY TO EIGHT-HUNDRED PERCENT SLOWER than fill2,
across a few different machines and versions of g++/gcc. The problem
ought to show up most dramatically on RISC machines, where memory
access requires significant load delays over register access. Without
register caching, every single matrix reference must first retrieve
`mp->cols' from memory. Worse, no strength-reducing optimizations are
possible that would replace the i * mp->cols multiply by simple
address increments each time through the loop, as may be done (and is
done, in most recent versions of g++, on most machines) with fill2.

Thus, programming this kind of function in an `object-oriented' manner
results in an unacceptable performance hit.  (Of course, there is no
reason to traverse in row-major order in `fill', making it a poor
example in this sense, but the principle holds across more realistic
situations. The differences are slightly less dramatic in the
nichols/grunwald example because the time-consuming `sqrt' calculation
masks these effects to some extent on many machines.)

This is a very real, and very frustrating problem in g++. It is not
really a `problem' in AT&T cfront, since cfront generates C code; thus
nearly all optimization is in the hands of the backend C compiler, and
fewer such C++ (non-C) language issues impact the resulting code
efficiency. 

Nevertheless, I see the problem as a C++ issue, even though it also
occurs in C, because in C, you can always program things along the
lines of `fill2', while in C++, you cannot do this if you want to
simultaneously take advantages of the benefits of object-oriented
programming features.

Optimizers cannot pull the kinds of invariants needed to produce good
code out of thin air. There is nothing about the imatrix declaration
that allows a compiler to safely assume that `data' does not point to
the `rows' or `cols' fields. In fact, in the g++/gcc optimizer, even
if `data' is of type `double*', as would be the case in floating point
matrices, the optimizer still isn't sure that `data' doesn't point to
these fields, since a programmer could have applied a coercion (i.e.,
mp->data = (double*)(mp->rows)).

There *may* be one way out of this using existing C++ constructs. If
the `rows' and `cols' fields were declared to be `const',

struct imatrix { const int rows, const int cols, int* data; };

then the optimizer might be allowed to assume that access through
`data' couldn't affect them. However, even this is possibly unsafe,
since a programmer could still have legally applied a coercion (i.e.
mp->data = (int*)(mp->rows)). I cannot find anything officially stated
in the C++ reference manual that would enable a compiler to assume
that const fields are never altered via this kind of `side-channel'
access. Moreover, even if there were, using const fields in order
to guide optimization would preclude ever resizing the matrix.
In fact, this is probably the most important kind of drawback of using
`const' in such applications.

It seems that any other solution requires some kind of minor extension
in the C++ language. Here are my best three ideas about this:


                   *** Array References ***

I'll start with the simplest, but probably most controversial idea.
Suppose there were a way to change C++ so that so that the difference
between pointers/references to arrays and pointers/references to
single objects were *enforced*. For example if `data' were a pointer
or reference to an array of ints, it could not point to (refer to) a
single int.  Surprisingly, there is a tolerably natural syntax
available for this.  Consider

struct imatrix { int rows, int cols, int[]& data; }

Define this to mean that data *MUST* refer to an array. This would
cure the kinds of problems mentioned above with fill, since `data'
could not possibly refer (point) to `rows' or `cols'. 

`data' elements can be accessed in the usual way via data[i].

For good reason, this syntax bears strong resemblance to that
for `vector_new' (e.g., `new int[100]').

Unfortunately, there is also the need to support the `star' version
as well, mainly to support vector_new. `int[]* p' must mean that
p points to an array of ints, elementwise dereferenceable via (*p)[i].

This *might* be easier to implement than it first seems. I see only
a few major changes that would be required minimally:

    1) Any <type>[]& may be SILENTLY, AUTOMATICALLY coerced into a <type>*,
        without `wasting' a coercion (Recall the C++ only-one coercion rule).
    2) The opposite coercion <type>* to <type>[]& is NEVER performed,
        except by manual intervention. An optimizing compiler
        is allowed to ignore any aliasing to non-array objects created
        by such coercions.
    3) If `a' is declared as <type> a[<count>], the type of a
        mention of `a' is <type>[]&. (When coupled with (1), this
        change is transparent to older existing C & C++ code).
    4) `vector_new' must be retyped to return <type>[] *.
        and `vector_delete' should take a <type>[]*.

Note that `int[]* a' is VERY different from `int* a[]' (== `int** a') 
and that `int a[]' still means the same as `int* a'. 

The only bit of strangeness to this proposal is that `int[]& a' and
`int[]* p' serve only as idioms. Just using `int[] a' isn't
meaningful.

Note also that, depending on details that ought to be hammered out,
these rules seem to  allow for reference assignments, unlike
rules for regular references. I'm not sure whether this is good or
bad.

The construct might also be very useful for function arguments. In

void f(int[]& d, int& x);

the compiler knows that x isn't aliased by d (although x *could*
still be an element of d). More importantly, f will only take
arrays, not pointers to single objects, as arguments:

void usef()
{
  int a;
  int[]& b = *(new int[100]);
  int c[100];
  int* d = new int;
  int x;
  
  f(&a, x); // ERROR
  f(b, x);  // OK
  f(c, x);  // OK
  f(d, x);  // ERROR

  b = c;    //  what do you think?
}

If programmers used this construct consistently whenever using arrays,
they might be spared a great many pointer errors. Perhaps its greatest
attraction is that the construct significantly improves the type safety
of C++.

I think that this idea contains a lot of promise. However, it
sidesteps the basic aliasing issue. The next idea has the opposite
features.


             ***  Register struct/class fields ***

According to ANSI and the C++ reference manual, if an object is
declared as `register', there are two consequences: the compiler is
encouraged to allocate a register for it, and, even if not, its
address may not be taken. Thus a register variable or argument may
never have an alias.

With a good register-allocating compiler like g++/gcc, the storage
class specification aspect of `register' is almost archaic. Given
this, `register' is only useful in its second meaning, as a *type
qualifier*, in the same league as `const' and `volatile'.

This is in fact what I propose: That `register' be treated as a 
type qualifier with respect to struct/class components, as in

struct imatrix
{
  register int rows;
  register int cols;
  int* data;
};

So far as I can see, this usage of `register' does not create
any new C++ parsing difficulties.

Again, encouragement to place such fields in registers is only
indirectly related to the use of `register' in this context. The
meaning of interest would be that

For any struct imatrix imat, constructs of the form
     int* rp = &imat.rows
     int& rr = imat.rows;
would be illegal.

Constructs like
     imatrix* mp = &imat
would still be OK (unless, of course, imat itself were declared as register).

In other words, register fields may not have aliases, although the
structs they are a part of may be aliased.  Thus, imat's rows field
could be changed only by direct modification (imat.rows = x). An
imatrix* mp could have its rows changed either manually or indirectly
via any f(imatrix*) function. (I take it as irrelevant to the
definition of such fields as registers that `imat.rows = x' implicitly
requires an offset address to be computed inside the compiler.)

There is a very unfortunate snag: the cast
     int* p = (int*)(&imat);
would still enable side-channel modification (as in *p = 1234 causing
imat.rows to be 1234). There seem to be two routes in dealing with this.

A `weak' interpretation would disable optimization if any potential
anything* alias for m existed and were destructively accessed.
Sadly, the `weak' interpretation does not help out at all, since, for
example, fill's mp->data could be an alias for mp.  

A `strong' interpretation of `register' would be to assume that no
side-channel access occurs, treating any mp->rows as potentially
volatile only when mp (or a possible imatrix* alias) modifies rows or
is possibly changed via a f(imatrix*). It appears that a compiler
operating under such an interpretation would generate essentially
identical code for both fill and fill2.

This means that, in order to be useful, the `strong' interpretation
would be required, even though its assumptions cannot be enforced.  I
believe that it should be adopted anyway.  In fact, I think that an
optimizer should be able to assume lack of side channel access in all
three of the cases discussed:

    * when a field is marked as `register'
    * when a field is marked as `const'
    * when there is a pointer type mismatch (e.g., double* data vs int rows)

For the same reason that C++ compilers emit warnings when a const is
coerced to a non-const, it would be nice to have a warning if a
pointer to a struct containing register fields were coerced into
something else.

There are several other related niceties that could be introduced
to make this a stronger proposal. Entities other than struct
components could similarly profit from the `register-as-type-qualifier'
notion.

Note that this use of `register' is NOT the same as the proposed ANSI
`noalias' qualifier: `register' means `cannot have an alias', while
`noalias' means `does not have an alias'. This difference is actually
quite substantial: Noalias can be applied to pointers to indicate
uniqueness.  For example, 
    void f(noalias struct imatrix* a, noalias struct imatrix* b); 
means that *a and *b are guaranteed distinct. This condition is
extremely difficult to enforce, and, in any case applies to a whole
different set of problems (chiefly, parallelization issues). No
such effect seems inherent in any use of `register' as described here.


            *** Compile-time directives ***

It is possible, although incredibly tedious, to use some kind of
#pragmas or asserts that, if recognized by the compiler, would provide
the same kind information.  For example:

void fill(struct imatrix* mp, int x)
{
  assert(!((mp->data >= mp && mp->data < (char*)mp + sizeof(imatrix)) ||
          (mp->data + mp->rows * mp->cols >= mp && 
           mp->data + mp->rows * mp->cols < (char*)mp + sizeof(imatrix))))
  /* i.e., no requested offset of data == rows or cols */
  int i, j;
  for (i = 0; i < mp->rows; ++i)
    for (j = 0; j < mp->cols; ++j)
      *(mp->data + i * mp->cols + j) = x;
}

Pretty ugly, huh?


----

If anyone has any other ideas or comments on this, I'd love to hear them!
This issue is among the chief reasons that the libg++ matrix library
has not yet been released. I refuse to distribute a package that is
guaranteed to convince users that C++ is not a serious candidate
for fast Matrix computations! 


Doug Lea, Computer Science Dept., SUNY Oswego, Oswego, NY, 13126 (315)341-2367
email: dl@oswego.edu              or dl%oswego.edu@nisc.nyser.net
UUCP :...cornell!devvax!oswego!dl or ...rutgers!sunybcs!oswego!dl