Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.3 4.3bsd-beta 6/6/85; site cad.UUCP Path: utzoo!watmath!clyde!burl!ulysses!ucbvax!cad!rudell From: rudell@cad.UUCP (Richard Rudell) Newsgroups: net.arch Subject: Re: Addressing modes (really VAX polyd instruction) Message-ID: <78@cad.UUCP> Date: Tue, 4-Mar-86 12:02:37 EST Article-I.D.: cad.78 Posted: Tue Mar 4 12:02:37 1986 Date-Received: Fri, 7-Mar-86 04:04:52 EST References: <946@garfield.UUCP> <1417@sdcsvax.UUCP> <6777@boring.UUCP> <1476@lanl.ARPA> Organization: U. C. Berkeley CAD Group Lines: 148 Keywords: RISC, optimiser, compiler Summary: VAX Poly Instruction > ...... What usually happens is > that the 'high level' instruction is not as efficient as a sequence or > loop of simpler instructions - or it is only efficient for a subset of > its possible uses. ( As an example, consider the famous case of the VAX > instruction for evaluating polynomials. It was never as fast as a loop ^^^^^^^^^^^^^^^^^^^^^^^ > of simpler code to do the same job and it was horribly inefficient in ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > the case where the compiler could detect a large number of zero > coefficients.) As a result, compilers must be VERY complex to generate > good code on CISC machines. When in doubt, run a test. The test program follows the message. I tested the following: 1) VAX polyd instruction 2) simple loop in C to evaluate a polynomial 3) simple loop in assembler using register variables 4) unrolled loop in assembler (i.e., no branches) The test was run for a degree 4 polynomial with no zero coefficients. Output from a VAX 11/785 with FPA: count is 1000000 Time for null loop was 1.0 microseconds Time for polyd was 15.3 microseconds 21.128271 Time for poly in C was 36.8 microseconds 21.128271 Time for poly asm loop was 28.5 microseconds 21.128271 Time for poly asm loop unrolled was 22.6 microseconds 21.128271 Certainly seems to me that the polyd runs faster than the best I can do in assembly language on a VAX. Results summarized from a VAX 8650 and a MicroVax-II (with FPA) are: VAX 8650 MicroVax-II VAX polyd 5.1 us 30.4 us C code 8.7 us 69.2 us asm loop 7.5 us 57.9 us asm unrolled 6.0 us 48.4 us I have been told that polyd is NOT emulated on the MicroVax-II, and these results seem to confirm this. The difference is less on the 8650, but still noticeable. I will not argue the point of whether polyd is a frequent enough instruction to warrant being included in an instruction set, nor whether a VAX might have a cycle time n% less if polyd weren't included. Nor will I comment on the efficiency or inefficiency of evaluating a polynomial with many zero coefficients in this way. Lastly, I will not comment on the difficulty of a compiler generating code for an instruction which changes 6 of the 16 general purpose registers. My only point is that the VAX polyd instruction is FASTER than the best one can do in VAX assembler given all nonzero coefficients. Period. So why is the VAX poly instruction so famous ? Richard Rudell. --------- program follows for those interested ----------- /* quick hack to time polynomial instruction */ /* WARNING: highly VAX 4.3bsd specific !!! */ #include #include #include double ptime() { struct tms buffer; double time; times(&buffer); time = buffer.tms_utime / 60.0; return time; } #define TIME(string, action) {\ double time = ptime(), timex; register int i_;\ for(i_ = 0; i_ < count; i_++) {action;}\ time = ptime() - time - offset;\ printf("Time for %s was %3.1f microseconds\n", string, time/count*1e6);\ if (strcmp(string, "null loop") == 0) offset = time;\ } double coef[12] = {1.0,-2.0,3.0,-4.0,5.0,-6.0,7.0,-8.0,9.0,-10.0}; main(argc, argv) int argc; char **argv; { double f, /* -8(fp) */ g=3.141512341222419283749239849839842, /* -16(fp) */ h=2.7182812454219283748238748392, /* -24(fp) */ offset=0.0; /* -32(fp) */ register int i, count; for(i = 0; i < 12; i++) coef[i] /= h; /* mix things up a little */ if (argc >= 2) count = atoi(argv[1]); else count = 100000; printf("count is %d\n", count); TIME("null loop", ;); TIME("polyd", asm("polyd -16(fp),$4,_coef"); asm("movd r0,-8(fp)")); printf("%f\n", f); TIME("poly in C", f=coef[0]; for(i=1;i<=4;i++) f=f*g+coef[i];); printf("%f\n", f); TIME("poly asm loop", asm("moval _coef,r3"); asm("movd (r3),r0"); asm("movd -16(fp),r4"); asm("movl $1,r2"); asm("Lxx: muld2 r4,r0"); asm("addd2 (r3)[r2],r0"); asm("aobleq $4,r2,Lxx"); asm("movd r0,-8(fp)")); printf("%f\n", f); TIME("poly asm loop unrolled", asm("moval _coef,r3"); asm("movd -16(fp),r4"); /* move arg to r4 */ asm("movd (r3)+,r0"); asm("muld2 r4,r0"); asm("addd2 (r3)+,r0"); asm("muld2 r4,r0"); asm("addd2 (r3)+,r0"); asm("muld2 r4,r0"); asm("addd2 (r3)+,r0"); asm("muld2 r4,r0"); asm("addd3 (r3),r0,-8(fp)")); printf("%f\n", f); }