Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!wuarchive!wugate!uunet!sco!abs From: abs@sco.COM (Amy Snader) Newsgroups: comp.unix.xenix Subject: Bug in SCO 3.2 kernel -- please read Keywords: can cause panic or corruption ; workaround enclosed Message-ID: <950@fiasco.sco.COM> Date: 6 Oct 89 00:40:25 GMT Organization: The Santa Cruz Operation, Inc. Lines: 65 Dear Netfolk: We have found a bug in SCO UNIX 3.2.0f that can cause filesystem corruption and/or panics, under some rather unusual circumstances. Very few sites will ever be bitten by this bug, but since it has a simple workaround, I urge you all to install the workaround even if you are not at risk. Provoking the bug requires that you bring the system down dirty, without running /etc/haltsys or /etc/shutdown, and then booting a kernel *different* than the one you were running before. Switching between kernels that have had their kernel parameters tuned in different ways should be fine, but switching between pairs of kernels that contain different sets of device drivers can sometimes trigger the problem. When you reboot the system using the new kernel the filesystem will be dirty from the previous abnormal shutdown, and fsck will ask you if you want to clean your filesystem. If you answer "yes" to this question, fsck will clean the filesystem then attempt to remount the root device. The bug is in the code that remounts the root -- some data can be lost across the remount. Symptoms of this can include a panic, a trap, or (rarely) actual filesystem corruption. If a panic occurs, it will be a "panic: trying to free already free block". If a trap occurs, the `eip' register will point to an address in the kernel routine `getfblk'. There's not much I can say about the filesystem corruption, except to assure you that it is the least likely of the possible scenarios. This bug has a very simple workaround. If the filesystem has been modified at the time that fsck is run, the bug will not occur. The shell script that causes fsck to be run is called `/etc/bcheckrc'. By adding a line to this script that slightly modifies the filesystem before running fsck, the bug can be prevented. In the file `/etc/bcheckrc', immediately before the lines that read: [ "$dofsck" ] && /bin/su root -c "/etc/fsck -y -s -D -b -a ${rootfs}" place the line: > /lost+found/magic_file ; rm /lost+found/magic_file Note that the name "magic_file" is not significant. You can rename this file as you like, but I do recommend that you place the file within "/lost+found", because that directory has slots preallocated for some new files. If perchance the directory /lost+found does not exist, please create it. This bug has been fixed in the upcoming Open Desktop release of 3.2. No support-level kernel fix is planned, though, because the bug can so easily be worked around. The bug does not affect any release of Xenix. If any of you have any questions about this bug, please feel free to mail me. --Amy (uunet!sco!abs decvax!microsof!sco!abs)