Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!uwm.edu!uakari.primate.wisc.edu!brutus.cs.uiuc.edu!apple!vsi1!wyse!mips!servitude!rogerk From: rogerk@mips.COM (Roger B.A. Klorese) Newsgroups: comp.sys.mips Subject: Re: Dump program causes system crash Message-ID: <34612@mips.mips.COM> Date: 17 Jan 90 20:16:01 GMT References: <1990Jan16.220024.2485@alberta.uucp> Sender: news@mips.COM Reply-To: rogerk@mips.COM (Roger B.A. Klorese) Organization: MIPS Computer Systems, Sunnyvale, CA Lines: 64 In article <1990Jan16.220024.2485@alberta.uucp> gordon@cs.UAlberta.CA (Gordon Atwood) writes: >Both machines experience what appears to be an identical problem. They both >crash when a filesystem is dumped using the raw device name. >I noticed that the new Sys V flavor didn't provide raw device names >for the filesystems... This is not true. The devices merely follow the System V organization for devices. /dev/dsk contains the block devices, and /dev/rdsk contains the raw devices. >Even more interesting is that when the m1000 was running the BSD 4.3 OS >the corresponding dump program was quite happy with both the block and the >raw device filesystems. That's because these are totally different ports and kernels. We didn't build RISC/os from UMIPS-BSD. We built it from UMIPS-V and re-ported many BSD commands, system calls, etc. to it. >The only other clue that I can provide is the error message which appears >on the console just prior to the crash. The messages was > "assertion failed !pg_ismod(pd), file: fault.c line 284" Thanks for this piece of information. It's an *extremely* important one. Please be sure to report anything you think could even possibly be remotely relevant to a problem; often it's not clear where the real clue is. >I have two suspicions: 1) Since the filesystem is active, perhaps the os >is detecting that a disk block has been read which is already been modified >in the real memory (ie flagged as modified). > >2) The raw device I/O has a bug. > >Any thoughts would be gratefully received. OK, the answer is: 3) In 4.0-based releases, there was a bug introduced which could cause the in-memory page tables and the TLB to become inconsistent under some circumstances. The two most visible symptoms are either the assertion failure on !pg_ismod (in a few different code locations) or hangs of 5 to 15 minutes' duration where the system "goes catatonic" (no apparent action at all) and spontaneously revives. It does *not*, however, produce permanent or hidden effects such as broken filesystems or incorrect calculations. This will be fixed in the forthcoming release 4.50. In addition, it will be provided in a "patch release" now being tested, which should become available in two to three weeks. Since patch releases do not undergo the same full QA cycle as major and minor releases, we prefer that users not strongly inconvenienced by the problems addressed in the patch release do not install it, as there is always the possibility that they will turn up some code regression that our testing did not detect. So we are asking that people who are getting unavoidable panics from this problem, or the hangs, install the patch, but that users who are having problems with a workaround (such as using the block device for dump) wait for 4.50. By the way, not that we're hiding this issue, but if you have a support contract for RISC/os, you probably would have gotten more direct and immediate response from the CRC at 1-800-443-MIPS. -- ROGER B.A. KLORESE MIPS Computer Systems, Inc. phone: +1 408 720-2939 928 E. Arques Ave. Sunnyvale, CA 94086 rogerk@mips.COM {ames,decwrl,pyramid}!mips!rogerk "Two guys, one cart, fresh pasta... *you* figure it out." -- Suzanne Sugarbaker