Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!uunet!sparky!dsndata!wayne From: wayne@dsndata.uucp (Wayne Schlitt) Newsgroups: comp.sys.hp Subject: Do NOT use ftio or cpio for backups Message-ID: Date: 7 Jun 91 21:27:58 GMT Sender: wayne@dsndata.UUCP Distribution: comp Organization: Design Data Lines: 153 well, it's a long story, but as a result of trying to install hp-ux 7.05, i had to re-initialize my main disk and restore everything from backups. that's when i found out that for at least the last six months, all of the backups we have been doing were corrupted! DO NOT, i repeat DO NOT use ftio or cpio for backups. there are SERIOUS problems with them. for those who do not know, the /etc/backup script uses cpio. remove /etc/backup off your disk. it is an accident waiting to happen. basically, there are three problems with ftio, and two of the three problems also happens with cpio because ftio and cpio compatible files. problem #1: the first problem with ftio is that is uses shared memory and it allocates the shared memory segment at the default location. this default location only leaves 64k to be used by malloc. ftio needs to malloc memory to keep track of the links on the files. 64k doesnt go very far. when the memory runs out, ftio stops linking files. this means that you get duplicate files and that can cause your disk space usage to go way up. (in our case, it filled up the disc.) you can change a kernel parameter (shmbrk) to "fix" this problem, but i dont know what formula to use to calculate what this value needs to be and if you set it too large, then programs that allocate large amounts of shared memory will start to fail because you are reserving too much memory for malloc. *sigh*. we are currently using a shmbrk value of 1024 (4MB for malloc), but i dont know if this will restore our entire system since i didnt try restoring /usr/spool/news. this problem also happens with cpio. cpio documents the problem in the bugs section of the man page, ftio makes no such warning. i (wrongly) assume that this meant that ftio didnt have this problem. unfortunately, i dont know of any way to get around this problem when using cpio. problem #2: the second problem is much more serious and it effects both cpio and ftio. the problem is that the inode numbers that are written to tape are limited to 64k. under the old ATT file system, you could not have more than 64k inodes on the disk. using the berkeley fast file system (like hp and sun have used for years) you can have more than 64k inodes. this means that if you have a file at inode 65556 (64k+20), cpio cant tell the difference between that file and the file at inode 20. since the size if the inode field is defined to be this way for compatibility reasons, i will probably not see any change in cpio. so, if you have a file with inode 65556 (64k+20) that is linked to another file, when you try to restore that file, it will actually link it to the file at inode 20. this means that when you try to restore your system, you could end up with, say, an article out talk.bizarre linked to /dev/kmem. in fact, when we restored our disk, we had things that were just as bad, if not worse than this. **** THE RESULT **** IF you have usenet news, OR you have a hp cluster (lots of links in the /dev/pty directories) OR you have a large hard disk, OR use use ACL's you could very easily run into this problem. to the best of my knowledge, there is no way around the problem because the information is lost when you do the backup. you are screwed. problem #3: this one i didnt personally run into, but i guess is a fundamental problem with cpio, ftio, tar, and fbackup. only dump and dd from the raw device do not have this problem. basically, when cpio et al read a file to back it up, it cause the system to change the "access time" of the file. if the access time isnt reset, then you cant do incremental backups 'cause time stamp info has been lost. so, cpio et al will reset the access time via the utime(2) system call. but this still leaves the "inode changed time" set. so, cpio et al cant depend on the inode changed time to tell them if the file needs to be backed up. other commands, such as chmod, chown and such will change the inode change time, but not the access time. this means, that if the only thing you have done is changed permissions or ownership of a file since the last backup, then the incremental backup will not notice and the new file stats will not be backed up. this is only a problem if you are doing incremental backups. in the HP manuals, they list 4 different ways of doing backups. the first two are cpio and ftio. i cant believe that with such fundamental problems with cpio/ftio that they would even suggest using them. the other two methods that they suggest are dd and fbackup. dd doesnt work if the media that you are backing up to is smaller than the backup media, it doesnt backup a live file system very well, and it is hard to restore from. fbackup is non-standard, and still has one of the three problems. it kind of looks like the one command that hp doesnt mention is dump, and it is the only way to get really reliable backups. (actually, they dont mention tar either, but from what i understand, tar has it's own set of major problems if you try to use it for backups.) we had our computer down for 2 1/2 days during the week, and it took me and another sysadmin the better part of a week to finish cleaning up from this mess. i really expected to be able to format the disk and restore from backups and have a working system in a few hours. instead me and the other sysadmin worked, literally, night and day for over 2 days just to get things working again. we are vab, and the thought of this kind of thing happening to one of our customers really scares me. i couldnt imaging trying to walk someone through all of this over the phone. we would probably have ended up flying out the the customer site or something. the only thing i can think of that is more important than good backups, is being able to install the system in the first place. does hp really expect people with less than 5 years of unix experience and good software development skills to be able to do backups? how were we supposed to know that the backups we were doing were corrupt? scratch the disk and do a restore, see if it works, if it doesnt, go off and find another way? how many weeks worth of work does hp expect every customer to do in order to find a workable backup system? why do they even mention cpio and ftio in the manuals? in case you cant tell, i am more than a little upset about all this. because of the backup being bad, we werent able to get some important things done for a trade show. working 18 hours a day for 3 days didnt do much for my disposition. anyway, i guess fbackup, dd or dump are then _only_ commands you should be using for backups. -wayne