Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!tut.cis.ohio-state.edu!ucbvax!bloom-beacon!athena.mit.edu!jik From: jik@athena.mit.edu (Jonathan I. Kamens) Newsgroups: comp.unix.questions Subject: Re: sparse files Message-ID: <1989Nov30.144852.7772@athena.mit.edu> Date: 30 Nov 89 14:48:52 GMT References: <21581@adm.BRL.MIL> Sender: root@athena.mit.edu (Wizard A. Root) Reply-To: jik@athena.mit.edu (Jonathan I. Kamens) Organization: Massachusetts Institute of Technology Lines: 39 In article <21581@adm.BRL.MIL> JAZBO@brownvm.brown.edu (James H. Coombs) writes: >Can someone explain exactly what a sparse file is? How does one get created? A "sparse file" is a file with a lot more NULLs in it than anything else (well, that's a general definition, but it's basically correct). Many (although not all -- the Andrew File System, for example does not) Unix filesystem types support the ability to greatly reduce the amount of space taken up by a file that is mostly nulls by not really storing the file blocks that are filled with nulls. Instead, the OS keeps track of how many blocks of nulls there are in between each block that has something other than nulls in it, and feeds nulls to anybody that tries to read the file, even though they're not really being read off of a disk. You can create a sparse file by fopen'ing a file and fseek'ing far past the end of the file without writing anything -- the file up to where you fseek will be NULL, and the kernel (probably) won't save all of those NULLs to disk. Programs that use dbm often create sparse files, because dbm uses file location as part of its hashing and tries to spread out entries in the database file so there is lots of blank space between them. The reason sparse files are a problem when it comes to copying is that the Kernel isn't smart enough (or perhaps it won't do it because it *is* smart :-) to figure out you're feeding it a sparse file if you actually feed it the NULLs. Therefore, standard file copying programs like cp that just read the file in and write it out in a different location lose, because they end up creating a file that really does take up as much as space physically as there are NULLs in the abstract file object. Jonathan Kamens USnail: MIT Project Athena 11 Ashford Terrace jik@Athena.MIT.EDU Allston, MA 02134 Office: 617-253-8495 Home: 617-782-0710