Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!uunet!dg!rec From: rec@dg.dg.com (Robert Cousins) Newsgroups: comp.unix.questions Subject: Re: sparse files Message-ID: <235@dg.dg.com> Date: 1 Dec 89 13:50:56 GMT References: <21581@adm.BRL.MIL> Reply-To: uunet!dg!rec (Robert Cousins) Organization: Data General, Westboro, MA. Lines: 81 In article <21581@adm.BRL.MIL> JAZBO@brownvm.brown.edu (James H. Coombs) writes: >"Sparse files" have been mentioned in several recent postings. For example: > >>Kemp@DOCKMASTER.NCSC.MIL writes: >>>Just for the record, is there *any* way to do a recursive copy >>>correctly? I.e. one that doesn't: >>> * turn symbolic links into actual files >>> * turn link loops into a series of infinitely nested copies >>> * alter the modify and change times >>> * choke on block and character special files >>> * turn holes in sparse files into real disk blocks >>I think afio will do this. I am not sure about the symlink >>stuff, though, as we're a SYS V only site. > >Can someone explain exactly what a sparse file is? How does one get created? > >--Jim > >Dr. James H. Coombs >Senior Software Engineer, Research >Institute for Research in Information and Scholarship (IRIS) >Brown University, Box 1946 >Providence, RI 02912 >jazbo@brownvm.bitnet >Acknowledge-To: A sparse file is one which has "holes" in it. Specifically, the amount of space required to store the file on disk is less than the length of the file (offset of the last byte). A sparse file can be created under UNIX by creating a file and then simply choosing not to write some portions of the file. The following program creates a sparse file: #include #include #include #include #include main() { int fp, status; off_t position; static char buffer[] = "This is a test of sparse files."; fp = open("test.file",O_RDWR+O_CREAT,0666); if (fp < 0) { printf("Unable to open/create file.\n"); exit(1); } position = lseek(fp,100000, SEEK_SET); printf("Moved the file to offset %d\n",position); status = write(fp, buffer, sizeof(buffer)); printf("Result status of write is %d\n",status); close(fp); exit(0); } UNIX treats the "holes" as 0's when read. In fact, UNIX has only minimal support for sparse files. Backing up sparse files often involves copying large amounts of nulls. Once an area of a file is written, it cannot be returned to its previous sparse state. One cannot REALLY tell (without heroic effort) if a given area of a file is just 0's or is not there. In arguments that UNIX is not suitable for DP applications, sparse files usually come up if the conversation goes on long enough between knowledgeable people. Some operating systems return an error which amounts to "you can't read that because there isn't anything there." Sparse files are quite popular for a number of Data Processing applications. (Effectively you can use them for hash buckets amongst other applications.) Furthermore, for some scientific applications, sparse files can be used to store sparse matrices. This, however, would require finer granularity than normally found in the sparse storage system. Most operating systems just check to see if there is a block allocated which would contain that information and if so return that value. Hence, a "sparse" file in which every other byte was written would appear to an application to be continuous. Robert Cousins Dept. Mgr, Workstation Dev't. Data General Corp. Speaking for myself alone.