Xref: utzoo comp.unix.ultrix:5188 comp.sys.dec:4414 Path: utzoo!utgpu!watserv1!watmath!att!att!pacbell.com!mips!zaphod.mps.ohio-state.edu!usc!cs.utexas.edu!uunet!mcsun!inria!ircam!mf From: mf@ircam.fr (Michel Fingerhut) Newsgroups: comp.unix.ultrix,comp.sys.dec Subject: Partioning disks -- warning... Message-ID: <1990Nov4.163016.3492@ircam.ircam.fr> Date: 4 Nov 90 16:30:16 GMT Sender: mf@ircam.ircam.fr (Michel Fingerhut) Organization: IRCAM, Paris (France) Lines: 68 Summary: The chpt utility does not check for blatantly incorrect partioning of a disk. Neither do any of the file system checking utilities, nor, apparently, the disk driver (and/or the error log mechanism). Moral: see below. Description: I happened to set the top of the last partition on an ra90 connected to a 5820 (really a 5810) under ultrix 4.0 *beyond* the physical size of the disk by a couple of hundred sectors. Now all sorts of bad things happened. a chpt did not complain (it could have warned me, eg by looking at the info in /etc/disktab or otherwise). Well, it was happy. b newfs didn't either. it made a file system on that partition, with a map comprising the inexistant blocks. Funny, it takes as argument the disk type, it could have looked into disktab and find the size of the disk, and then say: last partition too big, do you really mean this, Michael? Well, it did not. c a file was apparently created with blocks taken from that inexistent pool (sounds like an Italo Calvino title, if you ask me). Every time an access was made to those (inexistant) blocks, an error occurred and an attempt was made to replace them (and that failed too). The bad block table had olso been overwritten. d The uerf messages were cryptic -- so much so that it led DEC to believe the disk was physically damaged (rather than a stupid software problem) and they replaced it. Messages were: DISK TRANSFER ERROR DATA ERROR INVALID HEADER BAD BLK REPL ATTMPT REPLACEMENT FAILURE, INCONSISTENT RCT MEDIA FORMAT ERROR RCT CORRUPTED e elcsd got so many messages that it ate all the cpu time, finding barely the time to announce every second on the console the loss of about 2000 messages to the ErrorLog. I had to abort the machine and boot single user so as to gain any sort of control. f Using ncheck and iclr I removed all inodes pointing to these inexistant blocks, repartitioned, ran fsck several times, and after it announced it was happy, the system crashed when I first created a directory. Moral: Don't trust any program to have safeguards. Read the man pages and decide among contradictory info which is the one you trust, believe or understand. Eg., ra(4) says about the c partition of an ra90: disk start length ... ra?c 0 2409680 ... while /etc/disktab says: :pc#2376153:bc#8192:fc#1024:\ This is a significant difference. The difference between a happy file system and nights of crashes, fscks, backups and restores.