Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!uwmcsd1!nic.MR.NET!umn-cs!umn-d-ub!uwvax!oddjob!mimsy!chris From: chris@mimsy.UUCP (Chris Torek) Newsgroups: comp.arch Subject: CISC instructions Message-ID: <13254@mimsy.UUCP> Date: 27 Aug 88 05:32:39 GMT Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742 Lines: 101 Someone recently mentioned something that reminded me to try open-coding the VAX `It sings, it dances! It leaps and it prances! It's a dessert topping *and* a floor wax!' subroutine call instruction (`calls'). The results of a trivial test, on a VAX-11/785: calls, no arguments, null function (1 million iterations): 13.5 user seconds; open-coded, no arguments, null function: 14.6 user seconds. Here they are: /* calls version */ .globl _null _null: .word 0 ret .globl _main _main: .word 0 movl $1000000,r11 0: calls $0,_null sobgtr r11,0b ret /* open-coded version */ .globl _null _null: .word 0 ret .globl _main _main: .word 0 movl $1000000,r11 0: pushl $0 # nargs movl sp,r2 # new ap moval _null,r0 # routine to call movzwl (r0)+,r1 # get register mask pushr r1 # save registers movab 1f,-(sp) # fr_savpc movq ap,-(sp) # fr_savfp, fr_savap bisw2 $0x2000,r1 # fake a `calls' ashl $16,r1,-(sp) # save mask in 16..27: psw=0 pushl $0 movl r2,ap # set ap movl sp,fp # set fp jmp (r0) # `call' 1: sobgtr r11,0b ret Some notes about the open coded version: It conforms to the `calls' frame format (although using some other format could make it much faster: see below). It does not save the current psw, losing the condition codes, and perhaps more importantly, the trace bit, and some trap bits (which as it happens are zero anyway, at least during all normal operation). It does not align the stack. (The stack should never become unaligned anyway.) It clobbers registers r0 through r2. (These are always free at subroutine call boundaries in the Berkeley VAX compilers.) I found that `pushl $0' is faster than `clrl -(sp)' (though both are two bytes long---most peculiar). If I am allowed to avoid the standard stack frame format, I can cut the time to 7.1 seconds: /* modified open coded call */ .globl _null _null: movl sp,fp # build frame rsb .globl _main _main: .word 0 movl $1000000,r11 0: movq ap,-(sp) # save ap, fp moval 4(sp),ap # new ap jsb _null # call movq (sp)+,ap # restore ap, fp sobgtr r11,0b ret This is somewhat less realistic as no registers are saved, and none restored; if `null' were to use some, it would have to read: _null: pushr $mask # save local registers movl sp,fp # build frame /* body */ movl fp,sp # set up for return popr $mask # restore registers rsb which adds three instructions, two of them relatively slow (pushr and popr), changing the time (for mask=0) to 8.7 seconds. Summary: the fancy VAX instruction call is severe overkill. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris