Xref: utzoo comp.sys.att:6948 unix-pc.general:3272 Path: utzoo!mnetor!tmsoft!dptcdc!blyndru!hybrid!mdapoz From: mdapoz@hybrid.UUCP (Mark Dapoz) Newsgroups: comp.sys.att,unix-pc.general Subject: 3B1 Boot Loader Story (long) Message-ID: <1989Jul11.032822.28446@hybrid.UUCP> Date: 11 Jul 89 03:28:22 GMT Reply-To: mdapoz@hybrid.UUCP (Mark Dapoz) Organization: My Good 'Ol 3B1, Toronto, Ontario, Canada Lines: 51 I recently ran across a rather interesting "feechur" of the 3B1 boot loader. After successfully installing a second hard drive using Gil's instructions, I decided to replace the stock Miniscribe 6085 with a faster drive. In doing so I prepared the new drive (format, verify, allocate, etc.) using it as the second hard drive and then mounted it and cpio'ed the data to it from the first drive. Somehow in doing all this I messed up and managed to allocate the second partition (the page partition) as 0 blocks. This meant that the page partition and user partition both started at the same location on the disk! Of course I was already half way through copying the data to the new disk before I realized this so I immediatly stopped the cpio and reallocated the drive partitions using iv and mkfs. I then remounted the drive and started the cpio all over again. Once done everthing was fine. The drive booted when installed as the primary drive and all was fine..... until the next day. Sometime in the afternoon there was a power hit and the system was forced to reboot. The familiar boot loader message came up and the "#"'s came across the screen as the kernel was loaded, then nothing. Hmmm, stick the floppy boot disks in, mount the drive and all looks fine. fsck doesn't complain, kernel looks ok but still doesn't boot, so I restored the kernel from the original foundation disks as /newkern. Rebooted from the floppies and specifed /newkern on the HD as the kernel and up it came with no problems. Fine, link /unix to /newkern and reboot. Same problem appears again and the kernel is hung! Hmmm, figuring the link must have failed, I rebooted again from floppy and checked the inode numbers of /unix and /newkern, sure enough they were the same! It was now about 5 hours since the machine tried rebooting and I was getting rather desperate as to what to do next. As a last shot before reformatting the drive and staring over I decided to use fsdb to look around the filesystem for anything strange. All looked fine until I specifed the page partition as the filesystem to debug (don't ask why, I was desperate :-). Lo and behold fsdb found a filesystem and began to show me files! What, a filesystem on a page partition! Yes, it was my original filesystem that I created the first time around. When I used iv to rebuild the partitions it didn't remove any of the data so it was still there mostly intact. Now it's all starting to come together, it seems the boot loader looks at all the partitions on the drive, one at a time, looking for the name of the kernel you specified. Since the default kernel name is /unix, it found this name on the invalid filesystem on the page partition and tried loading it. Of course when I first built the system most of the data for the filesystem on the page partition was still intact because I hadn't had enough activity to cause paging to occur. But overnight the news expire probably caused paging which overwrote the data for /unix, but NOT the superblock for the filesystem. So you see, the boot loader ended up reading a very bad copy of /unix on the wrong filesystem which completely overrode the /unix on my user filesystem. Once I invalidated the superbock on the page partition using fsdb all was working fine again. Ah, the joys of a 3B1....... :-) -- Mark Dapoz (mdapoz@hybrid.UUCP) ...uunet!{mnetor,dptcdc}!hybrid!mdapoz I remind you that humans are only a tiny minority in this galaxy. -- Spock, "The Apple," stardate 3715.6.