Xref: utzoo comp.unix.microport:1086 comp.unix.xenix:2776
Path: utzoo!attcan!uunet!husc6!ukma!gatech!hubcap!hutch
From: hutch@hubcap.UUCP (David Hutchens)
Newsgroups: comp.unix.microport,comp.unix.xenix
Subject: Re: speeding up compress on 286
Message-ID: <2314@hubcap.UUCP>
Date: 25 Jul 88 18:21:50 GMT
References: <2165@hubcap.UUCP>
Organization: Clemson University, Clemson, SC
Lines: 126

New improved version, now with assembly source.

Earlier I wrote:
> 
> I don't know about Microport, but I have found that a LOT of time
> is spent doing long shifts on my Xenix system when I use a 16-bit
> compress.  This is in part because the C compiler generates a call
> to a routine to do long shifts.  What is worse, they coded the
> routine so that it is space efficient, rather than time efficient (It
> uses a total of 3 or 4 286 instructions looping through them as many
> times as the number of bits you wish to shift: i.e. it shifts one
> bit each time through the loop.)  I found that I could write my
> own routine - using a grand total of 50 more bytes or so, and in doing
> so I decreased the time required to do a 16-bit compress by about 30%!
> 
> I don't have the code in front of me but the basic idea was to use
> the 16-bit shift instructions and OR together the appropriate results.
> I suspect that for 1 and possibly 2 bit shifts the provided routine is
> faster, but compress does a lot of shifts of 10 bits or more, and with
> these, my routine wins by a BIG margin.

I received several replies requesting the source.  Again, I must caution
that these routines are designed to work with Microsoft Xenix 2.0.
I don't have any idea whether they work with any other compiler/os.

It turns out that the Microsoft compiler uses a non-standard call
sequence to call its own built in routines, including the long shift
operations.  These routines assume that the number to be shifted
is in the A (lower order bits) and D (higher order bits) registers at
entry (That is where the Microsoft C compiler I'm using puts them).
They assume that the number of bits to be shifted is in the CL register.
They distroy the CH register (I'm not positive if this is really safe, but
it works for the programs I have tried!).  I assemble the following with 'as'
and link it with the compress source.  Best of luck.  Remember to test it
well before giving it any trust.
 
 		David Hutchens
 		hutch@hubcap.clemson.edu
 		...!gatech!hubcap!hutch


----------  CUT HERE  -----------
;	Static Name Aliases
;
	TITLE   shift

	.287
_TEXT	SEGMENT  BYTE PUBLIC 'CODE'
_TEXT	ENDS
CONST	SEGMENT  WORD PUBLIC 'CONST'
CONST	ENDS
_BSS	SEGMENT  WORD PUBLIC 'BSS'
_BSS	ENDS
DGROUP	GROUP	CONST,	_BSS
	ASSUME  CS: _TEXT, DS: DGROUP, SS: DGROUP, ES: DGROUP
_TEXT      SEGMENT
	PUBLIC	__lshr
__lshr	PROC FAR
	cmp	cl,15
	jle	$LSRSMALL
	sub	cl,16
	xchg	ax,dx
	sar	ax,cl
	cwd
	ret
$LSRSMALL:
	mov	ch,cl
	push	dx
	shr	ax,cl
	sub	cl,16
	neg	cl
	shl	dx,cl
	or	ax,dx
	pop	dx
	mov	cl,ch
	sar	dx,cl
	ret	
__lshr	ENDP

	PUBLIC	__ulshr
__ulshr	PROC FAR
	cmp	cl,15
	jle	$ULSRSMALL
	sub	cl,16
	xchg	ax,dx
	shr	ax,cl
	sub	dx,dx
	ret
$ULSRSMALL:
	mov	ch,cl
	push	dx
	shr	ax,cl
	sub	cl,16
	neg	cl
	shl	dx,cl
	or	ax,dx
	pop	dx
	mov	cl,ch
	shr	dx,cl
	ret	
__ulshr	ENDP

	PUBLIC	__lshl
__lshl	PROC FAR
	cmp	cl,15
	jle	$LSLSMALL
	sub	cl,16
	mov	dx,ax
	shl	dx,cl
	sub	ax,ax
	ret
$LSLSMALL:
	mov	ch,cl
	push	ax
	shl	dx,cl
	sub	cl,16
	neg	cl
	shr	ax,cl
	or	dx,ax
	pop	ax
	mov	cl,ch
	shl	ax,cl
	ret	
__lshl	ENDP
_TEXT	ENDS
END