Xref: utzoo comp.lang.c:12348 comp.arch:6256 Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!ames!vsi1!daver!mips!earl From: earl@mips.COM (Earl Killian) Newsgroups: comp.lang.c,comp.arch Subject: Re: Loop unfolding Message-ID: <3010@wright.mips.COM> Date: 2 Sep 88 05:11:56 GMT References: <5612@june.cs.washington.edu> Lines: 62 In-reply-to: rik@june.cs.washington.edu's message of 1 Sep 88 01:03:05 GMT In article <5612@june.cs.washington.edu> rik@june.cs.washington.edu (Rik Littlefield) writes: ... As I said, unrolling was a very effective way of removing virtually all the overhead from these loops. Can anyone suggest other solutions analogous to the alternatives mentioned above, or for that matter, any better solution other than assembly language? I think unrolling is the right answer, but I would suggest just writing straight-forward C and using a good compiler instead of doing it in the source. I did that for your examples. Here are the inner loops: for (i = 0; i < n; i += 1) s += p[i]; lw t2,0(a1) lw t3,4(a1) lw t4,8(a1) addu v1,v1,t2 lw t5,12(a1) addu v1,v1,t3 addiu a1,a1,16 addu v1,v1,t4 bne a1,a2,0x68 addu v1,v1,t5 for (i = 0; i < n; i += 1) p[i] = *q[i]; lw t7,0(a2) lw t9,4(a2) lw t8,0(t7) lw t0,8(a2) sw t8,0(a1) lw t1,0(t9) lw t3,12(a2) sw t1,4(a1) lw t2,0(t0) addiu a2,a2,16 sw t2,8(a1) lw t4,0(t3) addiu a1,a1,16 bne a2,a3,0x118 sw t4,-4(a1) volatile int *r; for (i = 0; i < n; i += 1) *r = p[i]; lw t0,0(a1) addiu a1,a1,16 sw t0,0(a2) lw t2,-12(a1) nop sw t2,0(a2) lw t3,-8(a1) nop sw t3,0(a2) lw t4,-4(a1) bne a1,a3,0x1c4 sw t4,0(a2) For the last an assembly language coder could do better because he can know that r and p aren't aliased and get rid of the nops. For the others, the compiler's code is hard to beat. -- UUCP: {ames,decwrl,prls,pyramid}!mips!earl USPS: MIPS Computer Systems, 930 Arques Ave, Sunnyvale CA, 94086