Path: utzoo!utgpu!news-server.csri.toronto.edu!clyde.concordia.ca!uunet!wuarchive!cs.utexas.edu!rice!rice!sun-spots-request From: fuat@cunixf.cc.columbia.edu (Fuat C. Baran) Newsgroups: comp.sys.sun Subject: SunOS 4.1 multi-user dump causes crashes (RESOLV Keywords: SunOS Message-ID: <1990Aug13.005957.3972@rice.edu> Date: 9 Aug 90 16:29:17 GMT Sender: sun-spots-request@rice.edu Organization: Sun-Spots Lines: 64 Approved: Sun-Spots@rice.edu Originator: spots@titan.rice.edu X-Sun-Spots-Digest: Volume 9, Issue 294, message 10 Summary [you can skip to the end if you already know the story]: 25-May-90: Upgrade from SunOS 4.0.1 to SunOS 4.1 on Sun-4/280's (with 1 ALM-II, 2 Hitachi disks on a xylogics 451 controller, 1 tape drive on a xylogics 472 controller, 2 8 Mb and 1 32 Mb memory board). During first post-upgrade multi-user (logins disabled) full dump system crashed with: Memory Error Register 1d4 DVMA=1, context=0, virtual address=fff3cfc0 pme=0, physical address=fc0 panic: writeback error syncing file system... {at this point it hangs and we have to reset from the cpu board, though in one of the 20 or so crashes it saved a core image} 1-Jun-90: My first message to sun-spots/sun-managers. Got a few responses describing similar occurences, but no suggested solution worked. 20-Jun-90: Frustrated by Sun's lack of responsiveness in looking into the problem (hardware support people worked hard, swapping boards, building test systems, etc. despite their suspicions that the problem was software related), I posted my second message to sun-spots/sun-managers, and received even more reports of similar problems, including one other site that received a similar brush-off ("multi-user dumps aren't supported"). 31-Jul-90: After repeated calls to Sun and getting various managers involved and having the problem "escalated" even further, the problem was finally identified. ********************************************************************** Fix: Remove from /etc/fstab the line: /dev/xy0b swap swap rw 0 0 Apparently in SunOS 4.1, if you have an fstab entry for the default swap partition, then when you go multi-user and run swapon(8) the default swap gets added again. This eventually leads to the kernel crashing when dump runs and causes the system to swap. This is an unconfirmed theory (we are still waiting for our sources), but removing the fstab entry stopped the system from crashing. We are now back to daily multi-user incremental dumps on our systems. Now all we have to do is get one of our machines, whose disk got trashed when a faulty disk controller was swapped in during one of numerous experiments, back into full service. Thanks to everyone who responded with suggestions and reports of similar occurences. It helped put the pressure on Sun to get them to look at the problem seriously. --Fuat Internet: fuat@columbia.edu U.S. MAIL: Columbia University BITNET: fuat@cunixf Center for Computing Activities UUCP: ...!rutgers!columbia!cunixf!fuat 712 Watson Labs, 612 W115th St. Phone: (212) 854-5128 Fax: (212) 662-6442 New York, NY 10025