Path: utzoo!attcan!uunet!lll-winken!lll-tis!ames!umd5!mimsy!chris From: chris@mimsy.UUCP (Chris Torek) Newsgroups: comp.arch Subject: Re: The VAX Always Uses Fewer Instructions Message-ID: <11981@mimsy.UUCP> Date: 15 Jun 88 20:16:09 GMT References: <6921@cit-vax.Caltech.Edu> <28200161@urbsdc> <10595@sol.ARPA> Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742 Lines: 65 In article <10595@sol.ARPA> crowl@cs.rochester.edu (Lawrence Crowl) writes: >For example, the loop to add two vectors into a third on the VAX is: > > top: addl3 (rA)+, (rB)+, (rC)+ > sobgeq rD, top > >which takes seven bytes for two instructions. True. An optimising compiler might expand the loop, however: extzv $0,$3,rD,r0 bicl2 r0,rD # or bicl2 $7; same length casel r0,$0,$7 # start the right distance in 9: .word 0f - 9b # 0 .word 1f - 9b # 1 ... .word 7f - 9b # 7 7: addl3 (rA)+,(rB)+,(rC)+ 6: addl3 (rA)+,(rB)+,(rC)+ 5: addl3 (rA)+,(rB)+,(rC)+ 4: addl3 (rA)+,(rB)+,(rC)+ 3: addl3 (rA)+,(rB)+,(rC)+ 2: addl3 (rA)+,(rB)+,(rC)+ 1: addl3 (rA)+,(rB)+,(rC)+ 0: addl3 (rA)+,(rB)+,(rC)+ acbl $0,$-8,rD,7b # while (rD-=8) >= 0 This pushes the size up to (I think) 70 bytes. Too bad the RISC machines are still faster anyway :-) . Actually, you could get rid of the case and the branch table: extzv $0,$3,rD,r0 bicl2 r0,rD subl3 r0,$7,r0 # invert ashl $2,r0,r0 # times 4, size of addl3 instr below jmp (pc)[r0] # into the breach (or is it breech?...kapow! 0: addl3 (rA)+,(rB)+,(rC)+ # maybe an ancient muzzle loader :-) ) addl3 (rA)+,(rB)+,(rC)+ addl3 (rA)+,(rB)+,(rC)+ addl3 (rA)+,(rB)+,(rC)+ addl3 (rA)+,(rB)+,(rC)+ addl3 (rA)+,(rB)+,(rC)+ addl3 (rA)+,(rB)+,(rC)+ addl3 (rA)+,(rB)+,(rC)+ acbl $0,$-8,rD,0b This drops off 9 bytes, down to 61 bytes. You can get rid of 5 more bytes by changing the acbl into subl2 $8,rD bgeq 0b but on non-pipelined VAXen that might be slower. Alternatively, if you have another register free, `mnegl $8,r1'; then acbl with r1 instead of $-8; this saves only 1 byte overall, but brings the acbl down to 6 bytes. [nb. the sobgeq loop above runs rD+1 times, so I made the acbl loops do the same. rD is left in a different state (-8 vs -1), and I did need r0 for entry calculation.] All of this just goes to show that the VAX provides too many ways to do things! -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris