Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!linus!philabs!cmcl2!harvard!caip!lll-crg!lll-lcc!ucdavis!ucbvax!jade!ucbopal!mwm From: mwm@ucbopal.berkeley.edu (Mike (I'll be mellow when I'm dead) Meyer) Newsgroups: net.micro.amiga Subject: Re: AmigaDos... Message-ID: <670@jade.BERKELEY.EDU> Date: Wed, 7-May-86 07:18:45 EDT Article-I.D.: jade.670 Posted: Wed May 7 07:18:45 1986 Date-Received: Sun, 11-May-86 04:49:07 EDT References: <1076@h-sc1.UUCP> <615@jade.BERKELEY.EDU> <1084@h-sc1.UUCP> Sender: usenet@jade.BERKELEY.EDU Reply-To: mwm@ucbopal.UUCP (Mike (I'll be mellow when I'm dead) Meyer) Distribution: net Organization: Missionaria Phonibalonica Lines: 140 In article <1084@h-sc1.UUCP> breuel@h-sc1.UUCP (thomas breuel) writes: >It can take quite some time to list all the files in a directory, >into the 60 sec range for some of the disks I have. You must have some LARGE directories. The worst case I can find is fish disk #13, with 106 files and 6 directories, which takes less than 32 seconds to list. My c: directory (with 49 files) takes less than 12 seconds to list. Assuming that things are linear, I've gotta conclude that you've got directories with over 200 files in them. Even my junk directory on Unix isn't that bad (yet). >Seeking through the two or three linearly linked allocation blocks >on disk can be excrutiatingly slow, however, if you have to do >it for every seek, and if they are scattered. This seems to be the >case for several of my disks. I can assure you that in several cases >I had a strong dependence of seek time on position in the file, i.e. >very fast seeks for the beginning, and long delays for seeks near the >end of the file. Hmmm. I definitely saw no such correspondence. Seeks after the first took ~30 ticks no matter where in the 700K+ file they were, the first seek taking between 1.5 and 2.5 times that long, again seemingly position independent. I started with a fresh disk, so that there should be little or no fragmentation of the disk. If you didn't do that, you might try copying your data base to a freshly formatted disk, and seeing how long it takes. >Some arithmetic (correct me if I'm wrong): a file header block >or a file list block contains 72 block pointers, which represent >36k of data in a file. For a 720k file, the allocation table >can therefore grow to 20 blocks. For a search through a linear >chain of blocks, this is not insignificant anymore. You're right - I goofed. For some reason, I was thinking of 1K blocks, so each allocation block gets you 200 pointers for about 200K. Sorry. >|The comment >|about doing globbing is true; I still can't figure out why they used >|globbing so heavily in the WorkBench. > >It's because people like me forget their filenames so often :-). >A very large percentage of my CLI commands are dir's (on UN*X, >ls's). Likewise, most people probably prefer to be shown a dialog >box listing all available files, ready for click selection, to >a string dialog box. No, that's why *YOU* use globbing so heavily, not why the workbench does (opening *.info when you open a window). I don't run dir's and/or ls's all that often - but I use file completion a lot :-). >I was not suggesting a file system *identical* to UN*X. But there are >other alternatives. Again, I personally favour the idea of tree-like >allocation tables (because it makes truely random accesses affordable). >Also, it can actually help save space greatly if it allows for small >'seedling' files (just one allocated block) like ProDos. Tree-like allocation tables can be a good thing. Allowing both shouldn't be hard; it's got to be easier than the blocks/frags stuff that 4BSD does. Dynamically switching from a "linked list" of two to a log-tree when you hit three would probably be a good thing, if you're going to do both anyway (anyone at C/A listening?). >I believe that the loss of efficiency in 'open' by putting names back >into directories is tolerable. Putting names back in directories would cut the number of file entries per directory block by a factor of 8. It'd also require a redesign of the file system from the ground up, as you've gotta throw away the hashing and replace it by something else - say a linear search of the blocks in the directory, one block per 9 directory entries (roughly). >Furthermore, for large directories, >the name/inode-pair directory wins over the hash/bucket-list directory >again. If done carefully, the only thing that is lost in terms of >recoverability is that you may not be able to recover a file's name. >Otherwise, such a file system can be as robust as the current Amiga file >system. Hmmm - I'm beginning to suspect that we're using the same term for different things. Let me take time out to briefly describe the Unix tree structure, so that I know we're using the same terminology. The heart of the Unix directory tree is the ilist. This is a set of blocks at fixed locations on the file system, usually at the front (front of each cylinder group for 4BSD). Each inode in the ilist contains *all* information pertinent to a file - owner, size, status bits, pointers to blocks of data. A directory is a list of names and inode numbers, the inode number being the index into the ilist. Given this, it's obvious that an open looks like a loop that does: "search directory for name, and get inode number associated with name" "Go get block with that inode in it, so we'll have a list of blocks for the next directory" From this, it's obvious that directories with name/inode pairs in them take 2 disk accesses per element in the path. AmigaDOS takes 1 disk access per element in the path (both cases assuming that directories are "small", and ignoring the the overhead to get the file once you've find it). Putting names back in the directory will just make the definition of "small" directories much smaller. On the other hand, going to name/inode pairs in the directory gets you back to 2 disk accesses per path element. Worse yet, you either make the ilist a linked list (in which case, you'll learn what SLOW really means), or it lives in a known location on disk, meaning that a lost ilist block is a catastrophe. You could probably make the name/inode organization as robust as the AmigaDOS organization by following the same file header/block header structure. BTW, you can loose file names if you loose the file header in AmigaDOS. I think that's the worst case, though. Finally, the contention that large directories win with inode/name organizations - I disagree. Here's where the AmigaDOS caching finally comes in. Let's assume that a "small" directory requires one block on Unix, and has no collisions on AmigaDOS. For convenience, assume that "small" directories are the same size on both systems, and have N file entries (N for AmigaDOS is 72 (right?); for Unix, it's 32 for v7oid systems, a minimax of 42 on 4BSD, and probably one of 32/64/128 on sysV). Then access to a file in a small directory is the same on both systems - 1 disk access. For a directory that's got M*N entries, it's M/2 (average) on Unix to search the directory blocks - *not counting the overhead to search the allocation table*; and M/2 (average) on AmigaDOS to search the hash chain for that entry - NO EXTRA OVERHEAD. So AmigaDOS wins by an (over)head :-). Of course, for listing the files in the directory, AmigaDOS still has one block/file, plus the directory. Unix keeps getting N files names/block (again ignoring overhead). File system design - like the rest of OS design - is non-trivial, and correct solutions for one application/environment aren't correct - or even usuable - for another application/environment. The Unix OS on an Amiga, implemented roughly like the AmigaDOS file system is implemented (no caching!), would be a total disaster for almost anything. The AmigaDOS file system isn't perfect for the Amiga environment, and some of the Unix ideas would be an improvement (the log-tree allocation table and soft links ('aliases' under AOS, I think) just to start). But I have to agree with Matt Dillon that what the AmigaDOS file system needs most is disk caching.