Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!iuvax!cica!tut.cis.ohio-state.edu!gem.mps.ohio-state.edu!rpi!nyser!cmx!dl From: dl@cmx.npac.syr.edu (Doug Lea) Newsgroups: comp.lang.c++ Subject: Re: C++ vrs Fortran for linear algebra. Message-ID: <1822@cmx.npac.syr.edu> Date: 4 Aug 89 19:18:18 GMT References: <3967@portia.Stanford.EDU> <508@gill.UUCP> Reply-To: dl@cmx.npac.syr.edu (Doug Lea) Organization: Northeast Parallel Architectures Center, Syracuse NY Lines: 340 There is a thorny problem in attempting to write C++ matrix code (among similar sorts of applications) that can be compiled, via an optimizing compiler like g++, to run as fast as well-optimized Fortran code. Instead of using the example of Dave Nichols & Dirk Grunwald, I'll use a simpler example, that makes this clearer. In fact, the example can be written using only C constructs, even though it is definitely a C++-ish problem. struct imatrix /* a matrix of integers */ { int rows; /* number of rows */ int cols; /* number of columns */ int* data; /* pointer to rows*cols ints from freestore (or wherever) */ }; void fill(struct imatrix* mp, int x) { int i, j; for (i = 0; i < mp->rows; ++i) for (j = 0; j < mp->cols; ++j) *(mp->data + i * mp->cols + j) = x; /* traverse in row-major order */ } [In C++, you'd probably write this as something like class Imatrix { private: int rows; int cols; int* data; public: Imatrix(int r, int c) :rows(r), cols(c), data(new int[r * c]) {} int& operator() (int i, int j) { return *(data + i * cols + j); } void fill(int x) { for (int i = 0; i < rows; ++i) for (int j = 0; j < cols; ++j) (*this)(i, j) = x; } }; but nothing I'll say depends on these differences.] Fill() is an example of a function that cannot be optimized as well in C or C++ as it could in other languages because the compiler cannot safely assume that memory accesses through mp->data do not affect mp->rows or mp->cols. Thus, neither mp->rows nor mp->cols may be cached in a register, and very few other optimizations are possible. This is easy to see by comparing the code generated for fill with that for a similar fill2. void fill2(int rows, int cols, int* data, int x) { int i, j; for (i = 0; i < rows; ++i) for (j = 0; j < cols; ++j) *(data + i * cols + j) = x; } Using g++ (or gcc) (-O -fstrength-reduce), on a 1000 X 1000 imatrix, the first fill is FIFTY TO EIGHT-HUNDRED PERCENT SLOWER than fill2, across a few different machines and versions of g++/gcc. The problem ought to show up most dramatically on RISC machines, where memory access requires significant load delays over register access. Without register caching, every single matrix reference must first retrieve `mp->cols' from memory. Worse, no strength-reducing optimizations are possible that would replace the i * mp->cols multiply by simple address increments each time through the loop, as may be done (and is done, in most recent versions of g++, on most machines) with fill2. Thus, programming this kind of function in an `object-oriented' manner results in an unacceptable performance hit. (Of course, there is no reason to traverse in row-major order in `fill', making it a poor example in this sense, but the principle holds across more realistic situations. The differences are slightly less dramatic in the nichols/grunwald example because the time-consuming `sqrt' calculation masks these effects to some extent on many machines.) This is a very real, and very frustrating problem in g++. It is not really a `problem' in AT&T cfront, since cfront generates C code; thus nearly all optimization is in the hands of the backend C compiler, and fewer such C++ (non-C) language issues impact the resulting code efficiency. Nevertheless, I see the problem as a C++ issue, even though it also occurs in C, because in C, you can always program things along the lines of `fill2', while in C++, you cannot do this if you want to simultaneously take advantages of the benefits of object-oriented programming features. Optimizers cannot pull the kinds of invariants needed to produce good code out of thin air. There is nothing about the imatrix declaration that allows a compiler to safely assume that `data' does not point to the `rows' or `cols' fields. In fact, in the g++/gcc optimizer, even if `data' is of type `double*', as would be the case in floating point matrices, the optimizer still isn't sure that `data' doesn't point to these fields, since a programmer could have applied a coercion (i.e., mp->data = (double*)(mp->rows)). There *may* be one way out of this using existing C++ constructs. If the `rows' and `cols' fields were declared to be `const', struct imatrix { const int rows, const int cols, int* data; }; then the optimizer might be allowed to assume that access through `data' couldn't affect them. However, even this is possibly unsafe, since a programmer could still have legally applied a coercion (i.e. mp->data = (int*)(mp->rows)). I cannot find anything officially stated in the C++ reference manual that would enable a compiler to assume that const fields are never altered via this kind of `side-channel' access. Moreover, even if there were, using const fields in order to guide optimization would preclude ever resizing the matrix. In fact, this is probably the most important kind of drawback of using `const' in such applications. It seems that any other solution requires some kind of minor extension in the C++ language. Here are my best three ideas about this: *** Array References *** I'll start with the simplest, but probably most controversial idea. Suppose there were a way to change C++ so that so that the difference between pointers/references to arrays and pointers/references to single objects were *enforced*. For example if `data' were a pointer or reference to an array of ints, it could not point to (refer to) a single int. Surprisingly, there is a tolerably natural syntax available for this. Consider struct imatrix { int rows, int cols, int[]& data; } Define this to mean that data *MUST* refer to an array. This would cure the kinds of problems mentioned above with fill, since `data' could not possibly refer (point) to `rows' or `cols'. `data' elements can be accessed in the usual way via data[i]. For good reason, this syntax bears strong resemblance to that for `vector_new' (e.g., `new int[100]'). Unfortunately, there is also the need to support the `star' version as well, mainly to support vector_new. `int[]* p' must mean that p points to an array of ints, elementwise dereferenceable via (*p)[i]. This *might* be easier to implement than it first seems. I see only a few major changes that would be required minimally: 1) Any []& may be SILENTLY, AUTOMATICALLY coerced into a *, without `wasting' a coercion (Recall the C++ only-one coercion rule). 2) The opposite coercion * to []& is NEVER performed, except by manual intervention. An optimizing compiler is allowed to ignore any aliasing to non-array objects created by such coercions. 3) If `a' is declared as a[], the type of a mention of `a' is []&. (When coupled with (1), this change is transparent to older existing C & C++ code). 4) `vector_new' must be retyped to return [] *. and `vector_delete' should take a []*. Note that `int[]* a' is VERY different from `int* a[]' (== `int** a') and that `int a[]' still means the same as `int* a'. The only bit of strangeness to this proposal is that `int[]& a' and `int[]* p' serve only as idioms. Just using `int[] a' isn't meaningful. Note also that, depending on details that ought to be hammered out, these rules seem to allow for reference assignments, unlike rules for regular references. I'm not sure whether this is good or bad. The construct might also be very useful for function arguments. In void f(int[]& d, int& x); the compiler knows that x isn't aliased by d (although x *could* still be an element of d). More importantly, f will only take arrays, not pointers to single objects, as arguments: void usef() { int a; int[]& b = *(new int[100]); int c[100]; int* d = new int; int x; f(&a, x); // ERROR f(b, x); // OK f(c, x); // OK f(d, x); // ERROR b = c; // what do you think? } If programmers used this construct consistently whenever using arrays, they might be spared a great many pointer errors. Perhaps its greatest attraction is that the construct significantly improves the type safety of C++. I think that this idea contains a lot of promise. However, it sidesteps the basic aliasing issue. The next idea has the opposite features. *** Register struct/class fields *** According to ANSI and the C++ reference manual, if an object is declared as `register', there are two consequences: the compiler is encouraged to allocate a register for it, and, even if not, its address may not be taken. Thus a register variable or argument may never have an alias. With a good register-allocating compiler like g++/gcc, the storage class specification aspect of `register' is almost archaic. Given this, `register' is only useful in its second meaning, as a *type qualifier*, in the same league as `const' and `volatile'. This is in fact what I propose: That `register' be treated as a type qualifier with respect to struct/class components, as in struct imatrix { register int rows; register int cols; int* data; }; So far as I can see, this usage of `register' does not create any new C++ parsing difficulties. Again, encouragement to place such fields in registers is only indirectly related to the use of `register' in this context. The meaning of interest would be that For any struct imatrix imat, constructs of the form int* rp = &imat.rows int& rr = imat.rows; would be illegal. Constructs like imatrix* mp = &imat would still be OK (unless, of course, imat itself were declared as register). In other words, register fields may not have aliases, although the structs they are a part of may be aliased. Thus, imat's rows field could be changed only by direct modification (imat.rows = x). An imatrix* mp could have its rows changed either manually or indirectly via any f(imatrix*) function. (I take it as irrelevant to the definition of such fields as registers that `imat.rows = x' implicitly requires an offset address to be computed inside the compiler.) There is a very unfortunate snag: the cast int* p = (int*)(&imat); would still enable side-channel modification (as in *p = 1234 causing imat.rows to be 1234). There seem to be two routes in dealing with this. A `weak' interpretation would disable optimization if any potential anything* alias for m existed and were destructively accessed. Sadly, the `weak' interpretation does not help out at all, since, for example, fill's mp->data could be an alias for mp. A `strong' interpretation of `register' would be to assume that no side-channel access occurs, treating any mp->rows as potentially volatile only when mp (or a possible imatrix* alias) modifies rows or is possibly changed via a f(imatrix*). It appears that a compiler operating under such an interpretation would generate essentially identical code for both fill and fill2. This means that, in order to be useful, the `strong' interpretation would be required, even though its assumptions cannot be enforced. I believe that it should be adopted anyway. In fact, I think that an optimizer should be able to assume lack of side channel access in all three of the cases discussed: * when a field is marked as `register' * when a field is marked as `const' * when there is a pointer type mismatch (e.g., double* data vs int rows) For the same reason that C++ compilers emit warnings when a const is coerced to a non-const, it would be nice to have a warning if a pointer to a struct containing register fields were coerced into something else. There are several other related niceties that could be introduced to make this a stronger proposal. Entities other than struct components could similarly profit from the `register-as-type-qualifier' notion. Note that this use of `register' is NOT the same as the proposed ANSI `noalias' qualifier: `register' means `cannot have an alias', while `noalias' means `does not have an alias'. This difference is actually quite substantial: Noalias can be applied to pointers to indicate uniqueness. For example, void f(noalias struct imatrix* a, noalias struct imatrix* b); means that *a and *b are guaranteed distinct. This condition is extremely difficult to enforce, and, in any case applies to a whole different set of problems (chiefly, parallelization issues). No such effect seems inherent in any use of `register' as described here. *** Compile-time directives *** It is possible, although incredibly tedious, to use some kind of #pragmas or asserts that, if recognized by the compiler, would provide the same kind information. For example: void fill(struct imatrix* mp, int x) { assert(!((mp->data >= mp && mp->data < (char*)mp + sizeof(imatrix)) || (mp->data + mp->rows * mp->cols >= mp && mp->data + mp->rows * mp->cols < (char*)mp + sizeof(imatrix)))) /* i.e., no requested offset of data == rows or cols */ int i, j; for (i = 0; i < mp->rows; ++i) for (j = 0; j < mp->cols; ++j) *(mp->data + i * mp->cols + j) = x; } Pretty ugly, huh? ---- If anyone has any other ideas or comments on this, I'd love to hear them! This issue is among the chief reasons that the libg++ matrix library has not yet been released. I refuse to distribute a package that is guaranteed to convince users that C++ is not a serious candidate for fast Matrix computations! Doug Lea, Computer Science Dept., SUNY Oswego, Oswego, NY, 13126 (315)341-2367 email: dl@oswego.edu or dl%oswego.edu@nisc.nyser.net UUCP :...cornell!devvax!oswego!dl or ...rutgers!sunybcs!oswego!dl