Path: utzoo!attcan!uunet!europa.asd.contel.com!noc.sura.net!haven!udel!wuarchive!hsdndev!cmcl2!kramden.acf.nyu.edu!brnstnd From: brnstnd@kramden.acf.nyu.edu (Dan Bernstein) Newsgroups: comp.unix.internals Subject: Re: How do you find the symbolic links to files. Message-ID: <2993:Dec1202:37:2090@kramden.acf.nyu.edu> Date: 12 Dec 90 02:37:20 GMT References: <1990Dec7.192441.24778@dg-rtp.dg.com> <2469:Dec1001:13:4390@kramden.acf.nyu.edu> <1990Dec10.191522.2757@erg.sri.com> Organization: IR Lines: 88 In article <1990Dec10.191522.2757@erg.sri.com> zwicky@erg.sri.com (Elizabeth Zwicky) writes: > In article <2469:Dec1001:13:4390@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > >Elizabeth said that ``you have to get pretty intimate with the disk'' to > >tell that a file has holes, or something like that. She concluded that > >an archiver can with good conscience restore files with as many holes as > >possible, hence saving as much space as possible. > No, actually, Elizabeth didn't say either of those things. Well, sorry, I thought it was Elizabeth who said ``you have to get pretty intimate with the disk to tell that the 20 meg of nulls aren't there'' in <1990Dec5.052124.28435@erg.sri.com>. And who agreed in a later article with Tom's conclusions. But this is besides the point. Does anyone else understand the importance of restoring as much stat information as possible? It's an archiver's duty to do as good a job as it can. Now Elizabeth's position has been that an archiver cannot do this without going beyond the stat information and reading the raw disk. Other people have agreed that you don't need raw access, but claim that dumps become a lot slower. I'm more of an optimist: 1. On a system without st_blocks, an archiver can lseek past every 0-filled region. The system will automatically use holes wherever possible. (A) This doesn't require raw disk access. (B) Since stat doesn't care about holes, this doesn't destroy any information. (C) This wastes only restore time, not dump time. 2. On a system with st_blocks, an archiver can lseek past the first N 0-filled regions, enough to restore st_blocks; and then it can write explicit zeros in the rest. Even if it doesn't know the block size, it can use trial and error to get the right st_blocks, as Barry illustrated in a previous article; since most files in practice do not have holes, this will rarely be necessary. (A) This does not require raw disk access. (B) st_blocks is restored as we want. (C) This wastes only restore time, not dump time; and it only wastes restore time on files that actually do have holes. 3. On a system with full information about the locations of holes, an archiver can trivially record the locations and lseek appropriately on restore. (A) This does not require raw disk access. (B) All stat information is restored as we want. (C) This doesn't waste any time. 4. On a system... well, I've never seen any systems that don't fall under #1 or #2, and hopefully future systems will be under #3. People talking about ``portability'' simply don't understand what's going on here. An archiver ON SYSTEM X is responsible for restoring stat information as returned BY SYSTEM X. It is incredibly asinine to say ``#2 is wrong on an AT&T system''---#2 is not *meant* for an AT&T system! > What I did say is that you cannot tell the difference between a hole > and an equivalent number of nulls without reading raw blocks. > st_blocks at best tells you how many holes there are; it doesn't tell > you *where*. Right! So on a system with st_blocks, the archiver's responsibility is to restore the right number of holes. It can do this by making the first N zero-filled blocks into holes, with no regard to the original positions. This does *not* require access to the raw disk blocks. > Just as programs may, conceivably, care what st_blocks is > (care to name one that does?), they may also care where the holes are > (I have no examples of this one either, but it's equally imaginable). Yes, it is conceivable that a vendor would have a system returning different stat information. Here's the most important point I'm trying to make: On *that* system it is the archiver's responsibility to restore that stat information returned by *that* system. Do you understand this? It is even conceivable that a vendor will provide stat information that can't be restored properly without raw disk access. In your December 5 article you were trying to cast ``gloom'' on archivers for exactly this reason. But that's simply not true for System V or for standard BSD. > I conclude from this that good archivers are not portable. One can > arguably conclude that if you want a portable program, you can in good > conscience restore files with as many holes as possible, since you > can't get it right. No! This is what Tom said, and it is entirely wrong. On a BSD system the right strategy is #2: do what's necessary to restore st_blocks. A program can reasonably depend on that information, so an archiver that doesn't restore st_blocks is buggy. ---Dan