Path: utzoo!attcan!uunet!zephyr.ens.tek.com!orca.wv.tek.com!frip!andrew From: andrew@frip.WV.TEK.COM (Andrew Klossner) Newsgroups: comp.sys.m88k Subject: Re: Grabbing arithmetic overflow traps ? Message-ID: <7559@orca.wv.tek.com> Date: 29 Jun 90 16:58:18 GMT References: <9724@discus.technion.ac.il> <3418@oakhill.UUCP> <1990Jun27.173213.8250@dg-rtp.dg.com> <9733@discus.technion.ac.il> Sender: andrew@orca.wv.tek.com Reply-To: andrew@frip.wv.tek.com Organization: Tektronix, Wilsonville, Oregon Lines: 76 We've given some thought to these problems, but haven't implemented anything. Some thoughts: Yes, signal overhead is huge. 88k exception overhead is pretty large all by itself. On any exception, you've got to clean out the pipelines. There can be up to three data loads/stores suspended in flight, so you have to relaunch them, and two of them might cause data access exceptions (if, for example, they refer to invalid virtual addresses). You must also clean out the floating point unit pipelines and deal with any exceptions arising from this. This is tricky code to get right, and it must execute with shadow registers frozen, so you can't use a conventional debugger. (However, a Tektronix DAS 9200 ICEbox is quite useful in these circumstances -- end of commercial.) If you want to plug into the kernel's floating point exception handler, you'll likely find yourself operating within this constricted environment. I've been hacking 88k kernels for three years, but I wouldn't want to take on this task. Here are a couple of alternative means to your end: 1: Non-trap overflow detection. Yes, you lose cycles if you have to follow every addu and subu with conditional branching. It helps that a non-taken conditional branch eats only one cycle. If you're taking 5000 overflows a second, that's one or more overflows per 1000 operations. If you can detect overflow with a single one-cycle conditional branch, you'll do as well as if you install a trap handler that takes 1000 or more cycles to complete. 2: Restrictive overflow handling. Arrange that all pipelines will be empty when you perform an add or subtract, either by sophisticated instruction scheduling or by using "tb1 0,r0,0" instructions to wall off your add/sub code from loads, stores, and floating point or multiply/divide instructions. Modify the kernel so that, when your process is executing, integer overflow exceptions are delivered directly to you, bypassing pipeline correction. The kernel code might implement this by changing the code at the integer overflow exception vector to something like this: vector+0x48: br.n custom_int_overflow stcr r1,sr0 ; Save user's r1. ... custom_int_overflow: subu r31,r31,4 ; Stack the SNIP -- address of instruction ldcr r1,snip ; about to be executed. st.usr r1,r31,r0 ; System will take ERR exception and crash ; if user's r31 is invalid. subu r31,r31,4 ; Stack the SXIP -- address of faulting ldcr r1,sxip ; instruction. st.usr r1,r31,r0 ; Fetch the address of the user's exception ; handler. or.u r1,0,hi16(u.u_int_handler) ld r1,r1,lo16(u.u_int_handler) ; Fill load shadow with as many instructions ; as possible: stcr r0,snip ; Clear valid bit in SNIP. stcr r0,ssbr ; Wipe out all shadow scoreboard bits. ; Otherwise system will hang at RTE if ; pipelines were not in fact empty. or r1,r1,2 ; Turn on the "valid" bit, and arrange that stcr r1,sfip ; execution will resume at user's exception ; handler. ldcr r1,sr0 ; Restore user's r1. rte ; Return to user code. We call this "lightweight exception dispatch." Several further simplifications are possible. When any process other than yours is running, the code at vector+0x48 would point to the usual kernel overflow handler. -=- Andrew Klossner (uunet!tektronix!frip.WV.TEK!andrew) [UUCP] (andrew%frip.wv.tek.com@relay.cs.net) [ARPA]