Path: utzoo!mnetor!uunet!husc6!cmcl2!nrl-cmf!ames!pasteur!ucbvax!hplabs!pyramid!prls!mips!hansen From: hansen@mips.COM (Craig Hansen) Newsgroups: comp.lang.c Subject: Re: Bad optimizations (was Re: comma operator) Message-ID: <1391@mips.mips.COM> Date: 22 Jan 88 20:19:57 GMT References: <3819@sigi.Colorado.EDU> <5080013@hpfcdc.HP.COM> <7120@brl-smoke.ARPA> <39616@sun.uucp> Lines: 115 Keywords: optimization swap sun compiler Summary: a good compiler doesn't confuse the issue In article <39616@sun.uucp>, dgh%dgh@Sun.COM (David Hough) writes: > The above reflects what's called -O1 optimization in SunOS 4.0, > currently under development. I tried the example program under > maximal optimization (-O4) and it optimized away to just about nothing, > so I revised the program: > > #define swap(a,b) ((a) = ((b) = ((a) = (a) ^ (b)) ^ (b)) ^ (a)) > #define swap2(A, B, TYPE) { TYPE temp; temp = A; A = B; B = temp; } > > main() > { > int a,b; > > getab(&a,&b); /* pretend that a and b get new values */ > swap(a, b); > foo_marker(a,b); /* marker to delineate code sections */ > getab(&a,&b); /* pretend that a and b get new values */ > swap2(a, b, int); > foo_marker(a,b); /* marker to delineate code sections */ > } > > and found: > > jbsr _getab > addqw #8,sp > movl a6@(-8),d0 ex or macro > eorl d0,a6@(-4) > movl a6@(-4),d0 > eorl d0,a6@(-8) > movl a6@(-8),d0 > eorl d0,a6@(-4) > movl d0,sp@- > movl a6@(-4),sp@- > jbsr _foo_marker > ... > jbsr _getab > addqw #8,sp > movl a6@(-4),d7 move via temp > movl a6@(-8),a6@(-4) > movl d7,a6@(-8) > movl d7,sp@- > movl a6@(-4),sp@- > jbsr _foo_marker > > 8 instructions instead of 5, the difference being the 3 eorl's. > In general, you wouldn't use the EOR method unless you were out of > space for temporaries, which might be the case if you were microcoding > within a small register file, but is hardly typical of C applications. Maybe a better compiler would keep this from getting out of hand. Dave Hough's version of the example turns into the following code on a MIPS R2000: (-O level optimization). main: File 'swap.c': 0: #define swap(a,b) ((a) = ((b) = ((a) = (a) ^ (b)) ^ (b)) ^ (a)) 1: #define swap2(A, B, TYPE) { TYPE temp; temp = A; A = B; B = temp; } 2: 3: main() 4: { [swap.c: 5] 0x0: 27bdffd8 addiu sp,sp,-40 [swap.c: 5] 0x4: afbf0014 sw ra,20(sp) 5: int a,b; 6: 7: getab(&a,&b); /* pretend that a and b get new values */ [swap.c: 8] 0x8: 27a40024 addiu a0,sp,36 [swap.c: 8] 0xc: 0c000000 jal getab [swap.c: 8] 0x10: 27a50020 addiu a1,sp,32 [swap.c: 8] 0x14: 8fa30024 lw v1,36(sp) [swap.c: 8] 0x18: 8fa60020 lw a2,32(sp) 8: swap(a, b); [swap.c: 9] 0x1c: 00000000 nop [swap.c: 9] 0x20: 00661826 xor v1,v1,a2 [swap.c: 9] 0x24: 00663026 xor a2,v1,a2 [swap.c: 9] 0x28: 00c31826 xor v1,a2,v1 9: foo_marker(a,b); /* marker to delineate code sections */ [swap.c: 10] 0x2c: 00602021 move a0,v1 [swap.c: 10] 0x30: afa30024 sw v1,36(sp) [swap.c: 10] 0x34: 00c02821 move a1,a2 [swap.c: 10] 0x38: 0c000000 jal foo_marker [swap.c: 10] 0x3c: afa60020 sw a2,32(sp) 10: getab(&a,&b); /* pretend that a and b get new values */ [swap.c: 11] 0x40: 27a40024 addiu a0,sp,36 [swap.c: 11] 0x44: 0c000000 jal getab [swap.c: 11] 0x48: 27a50020 addiu a1,sp,32 [swap.c: 11] 0x4c: 8fa30024 lw v1,36(sp) [swap.c: 11] 0x50: 8fa60020 lw a2,32(sp) 11: swap2(a, b, int); [swap.c: 12] 0x54: 00601021 move v0,v1 [swap.c: 12] 0x58: 00c01821 move v1,a2 [swap.c: 12] 0x5c: 00403021 move a2,v0 12: foo_marker(a,b); /* marker to delineate code sections */ [swap.c: 13] 0x60: afa60020 sw a2,32(sp) [swap.c: 13] 0x64: 00602021 move a0,v1 [swap.c: 13] 0x68: afa30024 sw v1,36(sp) [swap.c: 13] 0x6c: 0c000000 jal foo_marker [swap.c: 13] 0x70: 00402821 move a1,v0 13: } [swap.c: 14] 0x74: 8fbf0014 lw ra,20(sp) [swap.c: 14] 0x78: 27bd0028 addiu sp,sp,40 [swap.c: 14] 0x7c: 03e00008 jr ra [swap.c: 14] 0x80: 00000000 nop This brings us back to reality. The exor version uses two registers to swap the values, and the temp version uses three, and both use three instructions. The exor version is slightly slower because the first instruction uses both of the operands, and so a load-delay occurs because b is used immediately upon loading it into a register; in more typical code, this wouldn't be a factor. -- Craig Hansen Manager, Architecture Development MIPS Computer Systems, Inc. ...{ames,decwrl,prls}!mips!hansen or hansen@mips.com