Xref: utzoo comp.arch:17298 comp.lang.functional:316 comp.lang.lisp:3436 comp.lang.prolog:2974
Path: utzoo!attcan!uunet!auspex!guy
From: guy@auspex.auspex.com (Guy Harris)
Newsgroups: comp.arch,comp.lang.functional,comp.lang.lisp,comp.lang.prolog
Subject: Re: GC triggering and stack limit checking by MMU hardware
Keywords: GC, stack, heap, MMU
Message-ID: <3729@auspex.auspex.com>
Date: 23 Jul 90 17:49:31 GMT
References: <1990Jul19.151524.22544@diku.dk> <11075@alice.UUCP>
Followup-To: comp.arch
Organization: Auspex Systems, Santa Clara
Lines: 58

>The problem is that certain machine architectures (eg. Motorola 68K) and OS
>implementations (eg. SunOS at least in some versions) do not provide
>a continuable address violation signal (SIGSEGV), even though at the kernel
>level, address translation faults (page faults) are continuable. Not having
>looked at the insides of those machine/OS combinations, I suspect that
>enough instruction execution context can be saved for filling a page fault
>in the kernel, but not enough for reentering the faulting process to
>handle a Unix signal and allocate the needed storage.

The amount of instruction execution context is the same in both cases;
the only difference is where it has to be stored.  I think SunOS 4.1 may
store enough of it outside the kernel stack to permit *one* such fault
to be continued from (i.e., don't expect to be able to return from a
SIGSEGV that occurs while you're handling a SIGSEGV).

Part of the problem is that Motorola:

1) wouldn't commit to the the "stack puke" stored by the 680[1andup]0
   being "safe" to hand to user-mode code; i.e., they wouldn't say
   "nothing you can do to the 'stack puke' is risky";

and

2) wouldn't describe the format of the "stack puke" to the extent
   necessary to have the kernel validate it.

I can see their not doing so as being perfectly legitimate; for all I
know, different revisions of the same chip may have different "stack
puke" formats, and even if they don't, they might not want any of that
stuff to be considered a "feature", and have people then write code
dependent on that stuff and lock them into continuing to provide those
"features".  It does, however, complicate the task of allowing user-mode
code to continue from a SIGSEGV.

>These observations are the result of practical experiments 5 or so years
>ago with Sun 2's and VAXen running Berkeley Unix. The former could not
>recover correctly from the segmentation violation (PC corrupted on return
>from the signal), the latter could.

The former has more context than the latter; the former has the 68010
"stack puke", the latter has, as I remember, just the First Part Done
bit (and some of the general-purpose registers, for some of the
long-running instructions like MOVCn).

The latter is safe and, I think, appears in the "signal context"
saved by a BSD signal, so that the instruction can be continued from
user mode without the kernel having to tuck away one or more sets of
context.

>Newer machines/architectures might handle this better,

I think most RISC machines (not entirely surprisingly) have less or no
context of that sort; I'd expect things to work OK on a SPARC-based Sun,
for example, as well as a MIPS-based machine. 

In fact, what architectures other than the 68K architectures have lots
of context for that?  I don't think the 386 or the WE32K, for example,
have that problem.