Path: utzoo!utgpu!news-server.csri.toronto.edu!clyde.concordia.ca!mcgill-vision!snorkelwacker!think!samsung!xylogics!transfer!lectroid!bigbootay!dswartz
From: dswartz@bigbootay.sw.stratus.com (Dan Swartzendruber)
Newsgroups: comp.arch
Subject: Re: GC triggering and stack limit checking by MMU hardware
Keywords: GC, stack, heap, MMU
Message-ID: <1785@lectroid.sw.stratus.com>
Date: 23 Jul 90 18:24:46 GMT
References: <1990Jul19.151524.22544@diku.dk> <11075@alice.UUCP> <3729@auspex.auspex.com>
Sender: usenet@lectroid.sw.stratus.com
Reply-To: dswartz@bigbootay.sw.stratus.com (Dan Swartzendruber)
Organization: Stratus Computer, Software Engineering.
Lines: 72

In article <3729@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:
::The problem is that certain machine architectures (eg. Motorola 68K) and OS
::implementations (eg. SunOS at least in some versions) do not provide
::a continuable address violation signal (SIGSEGV), even though at the kernel
::level, address translation faults (page faults) are continuable. Not having
::looked at the insides of those machine/OS combinations, I suspect that
::enough instruction execution context can be saved for filling a page fault
::in the kernel, but not enough for reentering the faulting process to
::handle a Unix signal and allocate the needed storage.
:
:The amount of instruction execution context is the same in both cases;
:the only difference is where it has to be stored.  I think SunOS 4.1 may
:store enough of it outside the kernel stack to permit *one* such fault
:to be continued from (i.e., don't expect to be able to return from a
:SIGSEGV that occurs while you're handling a SIGSEGV).
:
:Part of the problem is that Motorola:
:
:1) wouldn't commit to the the "stack puke" stored by the 680[1andup]0
:   being "safe" to hand to user-mode code; i.e., they wouldn't say
:   "nothing you can do to the 'stack puke' is risky";
:
:and
:
:2) wouldn't describe the format of the "stack puke" to the extent
:   necessary to have the kernel validate it.
:
:I can see their not doing so as being perfectly legitimate; for all I
:know, different revisions of the same chip may have different "stack
:puke" formats, and even if they don't, they might not want any of that
:stuff to be considered a "feature", and have people then write code
:dependent on that stuff and lock them into continuing to provide those
:"features".  It does, however, complicate the task of allowing user-mode
:code to continue from a SIGSEGV.

This is true!  I have been at companies doing 680x0 products where you
had to absolutely sure that all processors on the machine were running
compatible versions of microcode.  Otherwise you could get into the
scenario where processor #1 takes a page fault, dumps "stack puke", the
page fault is serviced, and the user process is restarted on a processor
with incompatible "stack puke".  The net result is not good!

:
::These observations are the result of practical experiments 5 or so years
::ago with Sun 2's and VAXen running Berkeley Unix. The former could not
::recover correctly from the segmentation violation (PC corrupted on return
::from the signal), the latter could.
:
:The former has more context than the latter; the former has the 68010
:"stack puke", the latter has, as I remember, just the First Part Done
:bit (and some of the general-purpose registers, for some of the
:long-running instructions like MOVCn).
:
:The latter is safe and, I think, appears in the "signal context"
:saved by a BSD signal, so that the instruction can be continued from
:user mode without the kernel having to tuck away one or more sets of
:context.
:
::Newer machines/architectures might handle this better,
:
:I think most RISC machines (not entirely surprisingly) have less or no
:context of that sort; I'd expect things to work OK on a SPARC-based Sun,
:for example, as well as a MIPS-based machine. 
:
:In fact, what architectures other than the 68K architectures have lots
:of context for that?  I don't think the 386 or the WE32K, for example,
:have that problem.


--

Dan S.