Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 8/28/84; site lll-crg.ARPA Path: utzoo!linus!philabs!cmcl2!seismo!umcp-cs!gymble!lll-crg!brooks From: brooks@lll-crg.ARPA (Eugene D. Brooks III) Newsgroups: net.lang.c Subject: Re: Re: Vectorizing C compiler for the Cray Message-ID: <510@lll-crg.ARPA> Date: Sun, 7-Apr-85 01:45:50 EST Article-I.D.: lll-crg.510 Posted: Sun Apr 7 01:45:50 1985 Date-Received: Tue, 9-Apr-85 00:48:50 EST References: <9759@brl-tgr.ARPA> Organization: Lawrence Livermore Labs, CRG group Lines: 82 > It's also not clear that a "vectorizing C compiler" makes > much sense, given the form of typical C code. For the traditional uses of C, operating systems programming, compilers, text editors ... a vectorizing C compiler indeed does not make much sense. C is not restricted to the above uses and it is a very good language for numerical applications (modulo the float-->double problem that is being fixed in the ANSI standard and has been fixed in any compiler that I have used for numerical work). I have been using C for numerical programming for 4 years now. Fortran used to be the only language I used and I have not used it for 4 years. I have even forgotten some of the key words. I am not an isolated case as there is a small but growing community of scientists who are using C instead of fortran for their work. The data structures that can be created in C make the layout of a typical program much cleaner and more easily understood. It is clear that for C to be used for numerical work on supercomputers one must have a vectorizing C compiler just as is the case for Fortran. Consider the code below. float **a, **b, **c; int dim; int i,j,k; for(i = 0; i < dim; i += 1) { for(j = 0; j < dim; j += 1) { a[i][j] = 0.0; for(k = 0; k < dim; k += 1) { a[i][j] += b[i][k] * c[k][j]; } } } Considering the inner loop one can see that a vector dot product is being formed. The fetch of b[i][k] is a stride 1 vector fetch. The fetch of c[k][j] is a gather fetch using an offset of j from the array of pointers c[k]. This loop will vectorize on the Cray XMP48, the Cray 2, the CDC 205, and the Convex C-1 among others. So why not have a vectorzing C compiler avaiable to the users of C on these machines? The only valid argument against the use of C for numerical work on supercomputers is the lack of a vectorizing compiler. A "vectorizing" compiler, where one means that the compiler unrolls loops to reduce the jump overhead and picks the best possible way to get the work done on a given machine is even useful on a scalar machine such as a VAX. As an example consider the following trick that I have in fact used frequently to get vector operations to run is fast as is possible on a VAX. void vdadd3(a,b,c,dim) double *a, *b, *c; int dim; { /* Leaving out the code to take care of the dim%N extra elts. */ dim /= N; do { *a++ = *b++ + *c++; *a++ = *b++ + *c++; . . . *a++ = *b++ + *c++; *a++ = *b++ + *c++; } while(--n > 0); } Just how big you can make N on a vax is determined by how big the cache is. At around N == 16 the instruction fetches start causing cache misses. Wouldn't it have been nicer to have simply written vdadd3(a,b,c,dim) { int i; for(i = 0; i < dim; i += 1) { a[i] = b[i] + c[i]; } } and have the compiler pick up the the vector operation and put in all the pointer crazyness and the loop unrolling! In this context a vectorizing compiler is even useful on a scalar machine.