Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!seismo!mimsy!chris From: chris@mimsy.UUCP Newsgroups: comp.arch Subject: Re: 64 Vs 32 Message-ID: <6149@mimsy.UUCP> Date: Sun, 5-Apr-87 14:25:46 EST Article-I.D.: mimsy.6149 Posted: Sun Apr 5 14:25:46 1987 Date-Received: Sun, 5-Apr-87 23:39:01 EST References: <7844@utzoo.UUCP> <563@sdiris1.UUCP> Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742 Lines: 78 In article <563@sdiris1.UUCP> rgs@sdiris1.UUCP (Rusty Sanders) writes: >... In fact, I know of at least one 32-bit mini computer that has >a 64-bit cache to memory bus (Data General). The Vax 11/780 has a 64 bit backplane (the SBI) between its cache and its memory. >This does add an interesting twist to optimizing compilers. It would >improve program performance to have code segments start on a [superword] >boundary. An obvious thing would be to place all subroutine entries >at a boundary. The Unix Vax assembler has a `.align' directive for such purposes, but the compiler emits only `.align 1's, which align to 2**1 bytes or 16 bit boundaries---probably because the first thing at each routine is a short word containing a register save mask (and a few other bits that are essentially never set anyway). >The use of swords has the biggest benifit with "cache buster" types of >programs. ... provided such programs were written carefully. Similar to the `cache buster' is the `VM buster': a program with multidimensional arrays where the fastest-varying subscript is varied the slowest (if that makes sense: if not, there is an example below). >... Recoding as follows: >vadd(size,a,b) > int size; > int a[2][],c[]; >{ > while (--size) > c[size] = a[0][size] + a[1][size]; >} This looks like the coder was `thinking FORTRAN and writing C', which is often a performance disaster (as is `thinking C and writing FORTRAN'). Aside from the nits: >As soon as a[0][size] is accessed, a[1] is loaded into cache. Since the last subscript varies fastest in C, as soon as a[0][size] is accessed, a[0][size+1] is cached. A C matrix add loop should read for (i = 0; i < size1; i++) for (j = 0; j < size2; j++) c[i][j] = a[i][j] + b[i][j]; /* and of course you can optimise with */ /* pointers, if it really comes to that. */ while the Ratfor loop should read for (j = 1; j <= size; j = j + 1) for (i = 1; i <= size; i = i + 1) c(i, j) = a(i, j) + b(i, j) (subscripts *do* start at one in FORTRAN?). Reversing the loops can have terrible effects on performance, due to cache effects (as described above) and due to `unexpected' VM behaviour (the scattered array references cause excessive page faults). (For the nit-p..., er, record, most likely what was meant was vadd(size, a, c) register int size; register int a[][2], c[]; /* or register int (*a)[2], *c; */ { while (--size >= 0) c[size] = a[size][0] + a[size][1]; } ) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) UUCP: seismo!mimsy!chris ARPA/CSNet: chris@mimsy.umd.edu