Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!coherent!dplatt From: dplatt@coherent.com (Dave Platt) Newsgroups: comp.sys.mac Subject: Re: Inefficiency with DiskFit 1.5 during backups Summary: No "best" solution is practical Keywords: Knapsack problem, NP-complete Message-ID: <23539@coherent.com> Date: 12 Apr 89 22:56:47 GMT References: <1122@atux01.UUCP> Reply-To: dplatt@coherent.com (Dave Platt) Organization: Coherent Thought Inc., Palo Alto CA Lines: 92 In article <1122@atux01.UUCP> jlc@atux01.UUCP (J. Collymore) writes: > I am having a small problem with DiskFit 1.5. I use DiskFit 1.5 to > backup my entire 30Mb external hard disk everyday. Right now I am > holding at around 20- 21 Mb on the hard disk. Optimally, this should > require 27 or 28 disks, but instead it is using about 31 disks. I find > that SOME disks, after doing incrementals, have anywhere from 50K to > 250K unused! I thought DiskFit was to shuffle things around so that it > weould make maximum use of ALL free space on the backup disks, but > repeatedly since I began using DiskFit 2 years ago, this in NOT the > case! Jim: you've just described a very difficult problem. The job of packing M objects (files) of different sizes into N "buckets" (diskettes), so that the total space used/wasted is minimized, is one of a large class of problems that are "NP-complete". No one has ever designed an algorithm that will produce the "best" solution for any of these problems that is capable of running in less than "exponential" time. It turns out that the only way known to find the "best" solution to one of these problems is to examine _every_ alternative, and then pick the best one... there's no way to prune the search-space down to a reasonable size, unless you're willing to abandon hope for the "best" solution and accept one that's merely "good". The number of possible alternative solutions goes up _very_ fast as the number of objects increases. Adding one file to the problem may double or triple the number of alternatives that must be considered... or even worse! I believe that the time required to calculate the optimal packing of as few as 300 files on 30 diskettes would take your Mac longer than the remaining lifespan of the Sun. This doesn't include the time necessary to write the files to disk ;-}. If you compare the "best" solutions for a collection of M files, and the best solution for M+1 files (i.e. add one new file, with all of the others being unchanged), then you will frequently find that the solutions look entirely different... all of the diskettes will have to be rearrange to move from the M-file case to the M+1-file case. No, DiskFit does not "shuffle around" files in the way that you're thinking. If a file has not been changed on your hard disk, then DiskFit will _not_ move the backup-copy around from one disk to another in order to permit tighter file-packing. This could greatly increase the amount of time that an incremental backup would require... in effect, any incremental backup could take just as long as a full backup. I'm not sure what algorithm DiskFit uses to decide how to group files together. I do know that it tends to place files from a specific folder onto the same set of diskettes... this helps keep the number of HFS directories on each diskette to a relatively small number (a Good Thing for several reasons) and makes it somewhat easier to locate and restore files by hand (from the Finder) if you should wish to do so. DiskFit could perhaps pack files more tightly together if it were willing to split even small files across multiple floppies. Currently, it splits files only if they're too large to fit on a single floppy. I would not wish to see this behavior change... it'd make manual file restoration rather more of a pain. > What I have to do periodically is trash the DiskFit Info file in the > system folder and start my backup FROM SCRATCH to ensure that each > disk's space is used optimally. At the 21Mb level, this can free up > anywhere from 1 to 3 disks that I was otherwise using in my "SmartSet." > I wish I did NOT have to use this method, because (with Write Verifies) > this procedure takes 45 - 60 minutes for 21Mb! Doing a full-backup-from-scratch is a good idea occasionally, anyhow, especially if you reinitialize your floppies first... this gives you an additional guarantee that the floppies themselves are still good. Is it so bad a thing to have a SmartSet that's 31 diskettes long, rather than 27 or 28? I figure that the slight inefficiency is costing you perhaps $6 per SmartSet... hardly a backbreaking cost. Re what you can do... not much. In my experience, DiskFit seems to keep the SmartSets within about 10% of being fully-utilized, even if I've been doing incremental backups to the same set for weeks or months. The space-waste doesn't seem to rise above 15% unless I delete a large number of files... and the space gets reused when I add new files to the hard-disk. On a more theoretical level... if you can devise an algorithm for any NP-complete problem that will produce an optimal solution while requiring only a polynomial-order amount of time and space, you'll be famous... I mean that quite seriously. If you can _prove_ that it cannot be done, you'll be equally famous. The NP-completeness problem is to Computer Science what Fermat's Last Theorem is to mathematics. -- Dave Platt FIDONET: Dave Platt on 1:204/444 VOICE: (415) 493-8805 UUCP: ...!{ames,sun,uunet}!coherent!dplatt DOMAIN: dplatt@coherent.com INTERNET: coherent!dplatt@ames.arpa, ...@uunet.uu.net USNAIL: Coherent Thought Inc. 3350 West Bayshore #205 Palo Alto CA 94303