Xref: utzoo comp.unix.microport:1086 comp.unix.xenix:2776 Path: utzoo!attcan!uunet!husc6!ukma!gatech!hubcap!hutch From: hutch@hubcap.UUCP (David Hutchens) Newsgroups: comp.unix.microport,comp.unix.xenix Subject: Re: speeding up compress on 286 Message-ID: <2314@hubcap.UUCP> Date: 25 Jul 88 18:21:50 GMT References: <2165@hubcap.UUCP> Organization: Clemson University, Clemson, SC Lines: 126 New improved version, now with assembly source. Earlier I wrote: > > I don't know about Microport, but I have found that a LOT of time > is spent doing long shifts on my Xenix system when I use a 16-bit > compress. This is in part because the C compiler generates a call > to a routine to do long shifts. What is worse, they coded the > routine so that it is space efficient, rather than time efficient (It > uses a total of 3 or 4 286 instructions looping through them as many > times as the number of bits you wish to shift: i.e. it shifts one > bit each time through the loop.) I found that I could write my > own routine - using a grand total of 50 more bytes or so, and in doing > so I decreased the time required to do a 16-bit compress by about 30%! > > I don't have the code in front of me but the basic idea was to use > the 16-bit shift instructions and OR together the appropriate results. > I suspect that for 1 and possibly 2 bit shifts the provided routine is > faster, but compress does a lot of shifts of 10 bits or more, and with > these, my routine wins by a BIG margin. I received several replies requesting the source. Again, I must caution that these routines are designed to work with Microsoft Xenix 2.0. I don't have any idea whether they work with any other compiler/os. It turns out that the Microsoft compiler uses a non-standard call sequence to call its own built in routines, including the long shift operations. These routines assume that the number to be shifted is in the A (lower order bits) and D (higher order bits) registers at entry (That is where the Microsoft C compiler I'm using puts them). They assume that the number of bits to be shifted is in the CL register. They distroy the CH register (I'm not positive if this is really safe, but it works for the programs I have tried!). I assemble the following with 'as' and link it with the compress source. Best of luck. Remember to test it well before giving it any trust. David Hutchens hutch@hubcap.clemson.edu ...!gatech!hubcap!hutch ---------- CUT HERE ----------- ; Static Name Aliases ; TITLE shift .287 _TEXT SEGMENT BYTE PUBLIC 'CODE' _TEXT ENDS CONST SEGMENT WORD PUBLIC 'CONST' CONST ENDS _BSS SEGMENT WORD PUBLIC 'BSS' _BSS ENDS DGROUP GROUP CONST, _BSS ASSUME CS: _TEXT, DS: DGROUP, SS: DGROUP, ES: DGROUP _TEXT SEGMENT PUBLIC __lshr __lshr PROC FAR cmp cl,15 jle $LSRSMALL sub cl,16 xchg ax,dx sar ax,cl cwd ret $LSRSMALL: mov ch,cl push dx shr ax,cl sub cl,16 neg cl shl dx,cl or ax,dx pop dx mov cl,ch sar dx,cl ret __lshr ENDP PUBLIC __ulshr __ulshr PROC FAR cmp cl,15 jle $ULSRSMALL sub cl,16 xchg ax,dx shr ax,cl sub dx,dx ret $ULSRSMALL: mov ch,cl push dx shr ax,cl sub cl,16 neg cl shl dx,cl or ax,dx pop dx mov cl,ch shr dx,cl ret __ulshr ENDP PUBLIC __lshl __lshl PROC FAR cmp cl,15 jle $LSLSMALL sub cl,16 mov dx,ax shl dx,cl sub ax,ax ret $LSLSMALL: mov ch,cl push ax shl dx,cl sub cl,16 neg cl shr ax,cl or dx,ax pop ax mov cl,ch shl ax,cl ret __lshl ENDP _TEXT ENDS END