Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!sdd.hp.com!news.cs.indiana.edu!rutgers!mcnc!rti!dg-rtp!larrybud.rtp.dg.com!goudreau From: goudreau@larrybud.rtp.dg.com (Bob Goudreau) Newsgroups: comp.unix.internals Subject: Re: How do you find the symbolic links to files. Message-ID: <1990Dec7.192441.24778@dg-rtp.dg.com> Date: 7 Dec 90 19:24:41 GMT References: <1990Dec5.052124.28435@erg.sri.com> <10960:Dec507:07:4190@kramden.acf.nyu.edu> <1990Dec5.190610.5612@dg-rtp.dg.com> <6647:Dec619:11:3690@kramden.acf.nyu.edu> Sender: usenet@dg-rtp.dg.com (Usenet Administration) Reply-To: goudreau@larrybud.rtp.dg.com (Bob Goudreau) Organization: Data General Corporation, Research Triangle Park, NC Lines: 81 In article <6647:Dec619:11:3690@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > > > it [st_blocks] tells you nothing about the number and location of > > holes in the file. > > That's quite correct. In the article you're responding to, I wrote > ``it can just squish the first N holes it finds, and write explicit > zeros in the remaining zero-filled blocks.'' One might infer from > this that there is no way to detect the locations of the holes. So > what? First you say that the archiver should perform certain actions on "the holes it finds", then you admit that "there is no way to detect the locations of the holes". So how, pray tell, is it supposed to find them? The only portable way is to examine the file data looking for stretches of nulls; but as I mentioned, this makes your program slower than it has to be. > > A truly portable method must use only standard functions (such as the > > ones defined in POSIX.1) and must assume nothing at all about block > > sizes or any other aspects of the file system structure. > > Well, if a POSIX system doesn't have st_blocks, then obviously a > portable program can't figure out that a file has holes, so there's no > point to figuring out how many holes there are. But every POSIX-based > system I've seen does have st_blocks. Broaden your horizons a little. A vast number of UNIX systems in the world are not BSD-based and do not have st_blocks. Since POSIX.1 also does not require it, any software that relies on st_blocks' presence will be seriously limiting its claims of portability. But even that's beside the point; the real issue is that st_blocks alone gives you very little useful information. Given a file's st_blocks and st_size counts, you can't say for certain that the file doesn't have any holes unless you also have some knowledge of the underlying file system format and its allocation mechanism. (Remember that st_blocks also counts things like indirect blocks and any blocks that may be allocated past the end of the file.) And even if you could determine the number and size of any holes in the file, st_blocks doesn't tell you where they are, so you still have to examine the file data anyway. Since st_blocks doesn't win much for us unless accompanied by other information acquired by non-portable means, we might as well forget about portability and have the archiver munge through the file system structures directly (a la dump(1M)). > > The obvious > > way to do this is to have the archiver program read() all > > bytes of the file while keeping an eye out for long stretches of > > 0-valued bytes so that it can store them in a special space-saving > > manner in its archive. > > This is only slow on files that do have holes, and then only on long > stretches of zeros. Er, yes, that's the point, isn't it? We're discussing how to make an archiver that wastes neither time nor tape. > > Unfortunately, while such > > an approach is portable, its performance will leave something to be > > desired on files with truly tremendous holes in them; much time will > > be wasted on read()ing the holes. > > No, there won't be any read() time wasted. There will be CPU time > wasted. (Tom points out in another article that vectorization helps > here.) Yes, there will be read() time wasted; the archiver must read() the entire file a chunk at a time and then check each chunk for zeros. For holes, the read()s shouldn't translate into many actual disk reads (except for the indirect blocks), but you're still making a lot of read() calls that would be totally unnecessary if you avoided the holes entirely. ---------------------------------------------------------------------- Bob Goudreau +1 919 248 6231 Data General Corporation goudreau@dg-rtp.dg.com 62 Alexander Drive ...!mcnc!rti!xyzzy!goudreau Research Triangle Park, NC 27709, USA