Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!samsung!uakari.primate.wisc.edu!caen!umich!terminator!merit.edu!rsc From: rsc@merit.edu (Richard Conto) Newsgroups: comp.arch Subject: Re: Workstation Data Integrity Keywords: Parity, crashing, abnormal termination Message-ID: <1990Aug29.165548.10599@terminator.cc.umich.edu> Date: 29 Aug 90 16:55:48 GMT References: <1990Aug3.204358.330@portia.Stanford.EDU> <40694@mips.mips.COM> <2399@crdos1.crd.ge.COM> <1990Aug10.171744.9639@zoo.toronto.edu> <2421@crdos1.crd.ge.COM> <1990Aug18.210132.25203@sco.COM> <2434@crdos1.crd.ge.COM> <6797.26d6edce@vax1.tcd.ie> <24 <3294@awdpr Sender: usenet@terminator.cc.umich.edu (usenet news) Reply-To: rsc@merit.edu (Richard Conto) Organization: U of Michigan, Merit Network Lines: 28 In article <3294@awdprime.UUCP> tif@doorstop.austin.ibm.com (Paul Chamberlain/32767) writes: >I'm sorry, but I have to go into reality mode here. I can understand >if you were running a simulation on the space shuttle you'd rather >get no answer than a wrong answer. But let's say you were doing something >more typical, like ... oh ... replying to a long article in news. You've >been typing and researching for an hour now. I ask you this: would you >rather I just blow away that entire article and crash your machine or change >a single random character? There's more choices than that. If your news is running on a multitasking machine, I'd hope that the kernel would be able to terminate the task (if the parity error occured in task-memory rather than kernel memory.) But think. A parity error MAY occur in the text being manipulated. But it can also occur in worse places. It could corrupt the datastructures in your news program, leading to an eventual core dump (but not right away.) If you're keeping track of core dumps (for whatever reason), do you want to waste time tracking down an obscure bug like that? If the memory is in user-space, the kernel should at the very least kill the task. If it doesn't check the page of memory that caused the parity error, it should (at the very least) never re-allocate that page, and log a nasty message on the operator's console. If the error occurs in kernel-space, it should try for as gracefull a shutdown as it can. Which may mean printing a very nasty message on the operator's console and halting, since it can't trust it's disk system anymore. --- Richard