Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!husc6!rutgers!labrea!rocky!rokicki From: rokicki@rocky.STANFORD.EDU (Tomas Rokicki) Newsgroups: comp.sys.amiga Subject: Large directory or many small directories? Message-ID: <550@rocky.STANFORD.EDU> Date: Thu, 3-Sep-87 03:04:04 EDT Article-I.D.: rocky.550 Posted: Thu Sep 3 03:04:04 1987 Date-Received: Sat, 5-Sep-87 09:59:44 EDT Organization: Stanford University Computer Science Department Lines: 53 Let's first calculate how many blocks must be read to locate an executable in a particular directory. On the Amiga, directories are organized as bucket hash tables of size, say, N. (I think N is something like 74 or 90 or so, but I don't have my manuals handy.) Under such a scheme, you have to read the directory, and you have to read M additional header blocks, where the file you want is Mth in that particular bucket. If you have F files in the directory randomly distributed, M_mean = a * (F/2N + 1/2) where a is some number greater than but close to 1. (This number takes into account the fact that the buckets do not fill exactly evenly.) If the file isn't found, you need to probe on the average M_mean = a * F/N before you realize that. So let us say you have f total files, distributed over d directories. Each directory has f/d files. On the average, you will search d/2-1/2 directories before searching the correct one. Thus, your total probes are: P = (d/2-1/2)(1 + af/dN) + 1 + a(f/2dN + 1/2) = a/2 + 1/2 + d/2 + af/2N Since f, a, and N are positive constants, you need to reduce d. Thus, you want to have fewer directories. Let's put some typical numbers in there. If a = 1.1, f = 150, and N=100, then the above reduces to 1.875 + d/2. Now we start adding some real world considerations. First of all, searching a directory such as ram: or vd0: is very nearly free, so put any files there that you can. Secondly, directories might be on different disks or different partitions of the same disk, so there is some time involved in going from one directory to the next. Thirdly, you want to put the most likely directory first. Thus, if we can assume that you will only need to search d/5-1/5 directories on average before finding the correct one (which is much more likely), then the equation becomes P = (d/5-1/5)(1 + af/dN) + 1 + a(f/2dN + 1/2) = a/2 + 4/5 + d/5 + af/5N + 3af/10dN which is slighly more complex. We can minimize this function with a little bit of calculus, to show that P is minimized when it is equal to sqrt(1.5aF/N), which, for our example numbers, is about 1.58. So again, few large directories. There may be errors in this analysis, as I did it extemporaneously (and all of the math in my head) but I believe it's fairly accurate. -tom