Xref: utzoo comp.arch:10560 comp.lang.misc:3059 Path: utzoo!attcan!uunet!dino!atanasoff!hascall From: hascall@atanasoff.cs.iastate.edu (John Hascall) Newsgroups: comp.arch,comp.lang.misc Subject: Re: Programming and Machine Operations Message-ID: <1178@atanasoff.cs.iastate.edu> Date: 9 Jul 89 15:32:39 GMT References: <57125@linus.UUCP> <1989Jun24.230056.27774@utzoo.uucp> <13970@haddock.ima.isc.com> <1398@l.cc.purdue.edu> <13979@haddock.ima.isc.com> Reply-To: hascall@atanasoff.cs.iastate.edu.UUCP (John Hascall) Organization: Iowa State Univ. Computation Center Lines: 73 In article <13979@haddock.ima.isc.com> suitti@haddock.ima.isc.com (Stephen Uitti) writes: >In various articles, cik & suitti argue... >>> The code sequence is: ... >>> >>> void lvecadd(a, b, c, s) /* a = b + c, length s */ >>> long *a; long *b; long *c; long s; >>> { >>> do { >>> *a++ = *b++ + *c++; >>> } while (--s != 0); >>> } >For the VAX: >L18:addl3 (r9)+,(r10)+,r0 >movl r0,(r11)+ >decl r8 >jneq L18 >>I am afraid I will have to give you a D- on this. Most of the time I would >>not even bother with a call, considering the code length. But the code is >>bad for vectors of length >2. How about >>{ >> end = a + s; >> do { >> *a++ = *b++ + *c++; >> } while (a < end); >>} >L26:addl3 (r9)+,(r10)+,r0 >movl r0,(r11)+ >cmpl r11,r7 >jlss L26 Of course, both routines fall in the dumper if the vector length is 0 :-) void lvecadd(a, b, c, s) register long *a, *b, *c, s; /* a[0..s] = b[0..s] + c[0..s] */ { if (s <= 0) return; /* never trust your callers */ do { *a++ = *b++ + *c++; } while (--s > 0); } giving (using VAX C V2.4-026 [VMS]): 0000 lvecadd: 007C 0000 .entry lvecadd,^m 5E 04 C2 0002 subl2 #4,sp 53 04 AC D0 0005 movl 4(ap),r3 52 08 AC D0 0009 movl 8(ap),r2 51 0C AC D0 000D movl 12(ap),r1 50 10 AC D0 0011 movl 16(ap),r0 01 14 0015 bgtr sym.1 04 0017 ret 0018 sym.1: 83 81 82 C1 0018 addl3 (r2)+,(r1)+,(r3)+ 50 D7 001C decl r0 F8 14 001E bgtr sym.1 I'm not sure why "sobgtr r0,sym.1" wasn't used for the last two instructions, is "decl r0 / bgtr sym.1" faster?? Does the optimizer not recognize the decrement/branch sequence?? [maybe someone with a newer version of the compiler could try it.] Why are r4,r5,r6 being saved? And what the devil is "subl2 #4,sp" for? Can't the optimizer clean up after itself?? At least it got the important part (the loop) right by using (rn)+ for all the operands. John Hascall