Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!mcnc!rti!dg-rtp!larrybud.rtp.dg.com!goudreau From: goudreau@larrybud.rtp.dg.com (Bob Goudreau) Newsgroups: comp.unix.internals Subject: Re: How do you find the symbolic links to files. Message-ID: <1990Dec5.190610.5612@dg-rtp.dg.com> Date: 5 Dec 90 19:06:10 GMT References: <25146@adm.brl.mil> <1990Dec5.052124.28435@erg.sri.com> <10960:Dec507:07:4190@kramden.acf.nyu.edu> Sender: usenet@dg-rtp.dg.com (Usenet Administration) Reply-To: goudreau@larrybud.rtp.dg.com (Bob Goudreau) Organization: Data General Corporation, Research Triangle Park, NC Lines: 47 In article <10960:Dec507:07:4190@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > > > Unfortunately, you > > have to get pretty intimate with the disk to tell that the 20 meg of > > nulls aren't there > > Hardly. You just look at the file size. Other than the file size, > there is no way a portable program can tell the difference between > a hole and an allocated block of zeros. If an archiver knows the > block size and sees that a file has N holes, it can just squish the > first N holes it finds, and write explicit zeros in the remaining > zero-filled blocks. By "file size" and "portable", I assume that you are talking about the st_size field of the POSIX.1-defined struct stat. This number means only "the file size in bytes"; it says nothing about how many bytes (or blocks) the file occupies on disk. Some UNIXes with BSD-derived file systems also define a field called st_blocks that reports the number of blocks occupied by the file, but this isn't much help. For one thing, it isn't portable over all UNIXes; for another, it tells you nothing about the number and location of holes in the file. A truly portable method must use only standard functions (such as the ones defined in POSIX.1) and must assume nothing at all about block sizes or any other aspects of the file system structure. The obvious way to do this is to have the archiver program read() all bytes of the file while keeping an eye out for long stretches of 0-valued bytes so that it can store them in a special space-saving manner in its archive. The unarchiving step must then perform an lseek() over each such stretch in order to avoid write()ing out potentially space-consuming null bytes. Unfortunately, while such an approach is portable, its performance will leave something to be desired on files with truly tremendous holes in them; much time will be wasted on read()ing the holes. That's why competent archiver utilities such as dump(1M) do in fact get pretty intimate with the system (the file system format, not the disk). By snooping around in the file system that contains the file, dump can quickly locate all holes in the file and avoid reading useless data. ---------------------------------------------------------------------- Bob Goudreau +1 919 248 6231 Data General Corporation goudreau@dg-rtp.dg.com 62 Alexander Drive ...!mcnc!rti!xyzzy!goudreau Research Triangle Park, NC 27709, USA