Path: utzoo!attcan!uunet!lll-winken!ames!ll-xn!mit-eddie!bloom-beacon!adam.pika.mit.edu!scs From: scs@adam.pika.mit.edu (Steve Summit) Newsgroups: comp.unix.wizards Subject: Re: GNU-tar vs dump(1) Summary: avoid system dependencies Message-ID: <8695@bloom-beacon.MIT.EDU> Date: 9 Jan 89 04:14:23 GMT References: <17999@adm.BRL.MIL> <629@mks.UUCP> <11@estinc.UUCP> <10797@rpp386.Dallas.TX.US> Sender: daemon@bloom-beacon.MIT.EDU Reply-To: scs@adam.pika.mit.edu (Steve Summit) Lines: 54 In article <10797@rpp386.Dallas.TX.US> jfh@rpp386.Dallas.TX.US (John F. Haugh II) writes: (with respect to compressing "empty" blocks of zeroes) >This problem and others can be solved by telling GNU-tar about the file >system. There is no reason a system utility shouldn't be aware of the >system layout. >How many CPU years are going to be wasted LZW'ing all those sparce blocks >when a little file system knowelege would have saved us all that grief? How many person years have been and will be wasted attempting to port programs which ought to be portable but which contain gratuitous system dependencies? Tar can be written portably; every attempt should be made to do so. It has already been asserted (and I'm inclined to believe it) that the time spent looking for zeroes to compress is inconsequential, particularly in an I/O intensive program such as tar. A good example of the same problem can be found in diff: a nice, simple text file utility which ought to be maximally portable, and is an especially attractive porting target because nothing like it exists on lesser systems such as VMS and MS-DOS. Yet part of its algorithm for distinguishing between text and binary files involves reading a struct exec from the beginning of the file and checking for magic numbers, which requires #including the (very Unix-specific) . Doing so is in fact pointless because the algorithm then goes on (in the absence of a valid magic number) to look through the beginning of the file for nonprinting characters, which a.out files are virtually certain to contain. Machine- or system-dependent code should be written only as a last resort, when the need is clear and dire, when no portable way of writing it can be found, and then only in utilities which "have a right" to contain such dependencies (adb, fsck, etc.). Tar is a file interchange program; you'll likely want to get it working on another system some day so you can transfer things. (Of course, non-essential system-dependent code, such as a Unix filesystem empty block check, or diff's magic number detection, could be surrounded with appropriate #ifdefs. Unfortunately, it rarely is, which leaves the eventual porter, if he isn't experienced and isn't the author, quite uncertain as to how to proceed, and liable to drop the project. In the case of the proposed filesystem knowledge for tar, an #ifdef unix wouldn't even help, because Unix filesystem formats have been known to change, and they can't even be assumed to be consistent on one system any more, given the existence of file system switches and remote file systems. Why commit tar to all of these problems?) >Write the code once and be done with it. Indeed. Steve Summit scs@adam.pika.mit.edu