Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!uunet!mcsun!sunic!kullmar!pkmab!ske
From: ske@pkmab.se (Kristoffer Eriksson)
Newsgroups: comp.lang.misc
Subject: Re: Common subexpression optimization
Message-ID: <2843@pkmab.se>
Date: 9 Feb 90 21:26:46 GMT
References: <4561@scolex.sco.COM> <14214@lambda.UUCP> <PCG.90Jan29131938@rupert.cs.aber.ac.uk> <2217@sunset.MATH.UCLA.EDU> <PCG.90Feb1164820@rupert.cs.aber.ac.uk> <4466@brazos.Rice.edu> <PCG.90Feb5183407@rupert.cs.aber.ac.uk> <4616@brazos.Rice.edu>
Organization: Peridot Konsult i Mellansverige AB, Oerebro, Sweden
Lines: 107

In article <4616@brazos.Rice.edu> preston@titan.rice.edu (Preston Briggs) writes:
>I'm still hoping to see somebody recode matrix multiply in C, using 
>all the source hacks they want, so we can compare it to what we get
>using Fortran and our (not mine, friends') latest transformations.
>
>We'd run it on a MIPS M/120, with 32 integer registers and 16 fp regs,

>The Fortran source is
>
>	subroutine mm(A, B, C, n)
>	do 1 i = 1, n
>	  do 1 j = 1, n
>	    A(i, j) = 0.0
>	    do 1 k = 1, n
>1	      A(i, j) = A(i, j) + B(i, k) * C(k, j)
>	end

I'm not a Fortran programmer, so I am not sure exactly how this routine
works. I guess that A, B, and C are arrays and n is the maximum index in
these arrays that is to participate in the multiply, and that the array
bounds are automatically passed to routine, without explicit mention.

That is not "natural" in C, thus we have the problem of what to do instead...

Most C compilers don't accept variable size array arguments. The simplest
way to code exactly the same functionality without that facility, I think,
is to use a macro. The compiler will have the information about the array
bounds available for the arrays the macro is used on. Each of the three
arrays can have a different size, and they need coincide with n. Maybe
you never would use it that way, but it is the same as what I suppose
the Fortran routine does.

----cut----
#define mm(A, B, C, n)					\
    {							\
	register unsigned idxsize k, j, i;		\
	register float acc;				\
							\
	for(i = 0; i < n; ++i) {			\
	    for(j = 0; j < n; ++j) {			\
		acc = 0.0;				\
		for(k = 0; k < n; ++k)			\
			acc += B[i][k] * C[k][j];	\
		A[i][j] = acc;				\
	    }						\
	}						\
    }

/* Simple test case */
typedef short idxsize;
float a[22][16], b[17][32], c[45][16];
main()
{
    mm(a,b,c,16);
}
----cut----

I don't bother using pointers (or reversing the looping direction). I don't
see any reason why the compiler should not have the possibility of optimizing
the array indexing on it's own. Probably many compilers don't do that, but I
think it should in principle be possible for the C-compiler to optimize this
routine in the same way as the Fortran compiler does. (This being a macro,
the C-compiler will even generate in-line code and use constant array sizes,
even without global code analyses.) I don't know about MIPS' compilers.

If I instead decide that n gives the array size (maximum indexes) for all
arrays, I can write a simple function for it:

----cut----
/* Simple test case */
typedef short idxsize;
float a[16][16], b[16][16], c[16][16];
main()
{
	mm(a,b,c,16);
}

/* Matrix multiply. */
mm(A, B, C, n)
    register n;
    register float *B, *C;
    float *A;
{
    register unsigned idxsize k, j, i;
    register float acc;

    for(i = 0; i < n; ++i) {
	for(j = 0; j < n; ++j) {
	    acc = 0.0;
	    for(k = 0; k < n; ++k)
		acc += B[i * n + k] * C[k * n + j];
	    A[i * n + j] = acc;
	}
    }
}
----cut----

I could write a function that takes differing array sizes too. It isn't
very hard, but there are many ways to do it to choose among.

If nothing else works, it should be possible to explicitly code most
of the things the Fortran compiler might do automatically. That's the
character of C, for good or worse.
-- 
Kristoffer Eriksson, Peridot Konsult AB, Hagagatan 6, S-703 40 Oerebro, Sweden
Phone: +46 19-13 03 60  !  e-mail: ske@pkmab.se
Fax:   +46 19-11 51 03  !  or ...!{uunet,mcvax}!sunic.sunet.se!kullmar!pkmab!ske