Path: utzoo!attcan!uunet!zephyr.ens.tek.com!orca.wv.tek.com!frip!andrew
From: andrew@frip.WV.TEK.COM (Andrew Klossner)
Newsgroups: comp.sys.m88k
Subject: Re: Grabbing arithmetic overflow traps ?
Message-ID: <7559@orca.wv.tek.com>
Date: 29 Jun 90 16:58:18 GMT
References: <9724@discus.technion.ac.il> <3418@oakhill.UUCP> <1990Jun27.173213.8250@dg-rtp.dg.com> <AGLEW.90Jun27202626@basagran.csg.uiuc.edu> <9733@discus.technion.ac.il>
Sender: andrew@orca.wv.tek.com
Reply-To: andrew@frip.wv.tek.com
Organization: Tektronix, Wilsonville, Oregon
Lines: 76

We've given some thought to these problems, but haven't implemented
anything.  Some thoughts:

Yes, signal overhead is huge.  88k exception overhead is pretty large
all by itself.  On any exception, you've got to clean out the
pipelines.  There can be up to three data loads/stores suspended in
flight, so you have to relaunch them, and two of them might cause data
access exceptions (if, for example, they refer to invalid virtual
addresses).  You must also clean out the floating point unit pipelines
and deal with any exceptions arising from this.  This is tricky code to
get right, and it must execute with shadow registers frozen, so you
can't use a conventional debugger.  (However, a Tektronix DAS 9200
ICEbox is quite useful in these circumstances -- end of commercial.)

If you want to plug into the kernel's floating point exception handler,
you'll likely find yourself operating within this constricted
environment.  I've been hacking 88k kernels for three years, but I
wouldn't want to take on this task.

Here are a couple of alternative means to your end:

1:  Non-trap overflow detection.  Yes, you lose cycles if you have to
follow every addu and subu with conditional branching.  It helps that a
non-taken conditional branch eats only one cycle.  If you're taking
5000 overflows a second, that's one or more overflows per 1000
operations.  If you can detect overflow with a single one-cycle
conditional branch, you'll do as well as if you install a trap handler
that takes 1000 or more cycles to complete.

2: Restrictive overflow handling.  Arrange that all pipelines will be
empty when you perform an add or subtract, either by sophisticated
instruction scheduling or by using "tb1 0,r0,0" instructions to wall
off your add/sub code from loads, stores, and floating point or
multiply/divide instructions.  Modify the kernel so that, when your
process is executing, integer overflow exceptions are delivered
directly to you, bypassing pipeline correction.  The kernel code might
implement this by changing the code at the integer overflow exception
vector to something like this:

vector+0x48:
	br.n	custom_int_overflow
	stcr	r1,sr0		; Save user's r1.

	...

custom_int_overflow:
	subu	r31,r31,4	; Stack the SNIP -- address of instruction
	ldcr	r1,snip		;   about to be executed.
	st.usr	r1,r31,r0	; System will take ERR exception and crash
				;   if user's r31 is invalid.
	subu	r31,r31,4	; Stack the SXIP -- address of faulting
	ldcr	r1,sxip		;   instruction.
	st.usr	r1,r31,r0
				; Fetch the address of the user's exception
				;   handler.
	or.u	r1,0,hi16(u.u_int_handler)
	ld	r1,r1,lo16(u.u_int_handler)
				; Fill load shadow with as many instructions
				;   as possible:
	stcr	r0,snip		; Clear valid bit in SNIP.
	stcr	r0,ssbr		; Wipe out all shadow scoreboard bits.
				;   Otherwise system will hang at RTE if
				;   pipelines were not in fact empty.
	or	r1,r1,2		; Turn on the "valid" bit, and arrange that
	stcr	r1,sfip		;   execution will resume at user's exception
				;   handler.
	ldcr	r1,sr0		; Restore user's r1.
	rte			; Return to user code.

We call this "lightweight exception dispatch."  Several further
simplifications are possible.  When any process other than yours is
running, the code at vector+0x48 would point to the usual kernel
overflow handler.

  -=- Andrew Klossner   (uunet!tektronix!frip.WV.TEK!andrew)    [UUCP]
                        (andrew%frip.wv.tek.com@relay.cs.net)   [ARPA]