Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!iuvax!rutgers!mit-eddie!uw-beaver!microsoft!w-colinp From: w-colinp@microsoft.UUCP (Colin Plumb) Newsgroups: comp.sys.amiga.tech Subject: Faster file systems (the disk format) Summary: Will somebody *please* disagree with my decisions? Keywords: BOOH Message-ID: <20@microsoft.UUCP> Date: 29 Jan 89 04:10:19 GMT Reply-To: w-colinp@microsoft.uucp (Colin Plumb) Organization: very little Lines: 127 Well, I've settled on a disk format for my faster file system. I use variable-length filename and comment fields, since everything is variable- sized anyway. I'm rather proud of the space savings over standard AmigaDOG, so I'm bragging to the net. (Really, I want *negative* comments.) For those who didn't see my other postings, you may get a bit lost, but the important thing is that this is exclusively for Amiga floppies. That means that I'm willing to do a lot of work in memory to avoid more seeks, and space utilisation is important. Anyway, it goes like this: - A disk is 160 tracks, with track 0 having 1024 bytes reserved for boot blocks. What's a sector? Never heard of them... :-) - Data is allocated within tracks on single-byte boundaries. - A track is a collection of extents, each with a 4-byte header (inode # and size), and possibly a free extent (which is handled in a slightly more complex way, since it can be only 1 byte long). The inode number is the inode number for inodes (gee), or the inode number of the owner of a data extent. An inode number encodes the track it's on, in a slightly tricky way, as it's possible to have 276 inodes on a track, but as there are 160 tracks, we can't just have 9 and 7 bit fields. (Yes, it's possible to fit over 32768 empty files on a floppy. I was rather amazed at this result myself.) - Timestamps are stores in 1 32-bit word, a la Unix, not in 3, a la AmigaDOG. - The boot blocks contain a pointer to the free map extent, which has some magic inode field in its header, and contains 4 bytes of volume creation time, 4 bytes of volume modification time, 320 bytes of free map (2 bytes of count of free bytes for each track) and 2 bytes of inode # of the root directory. A count of -1 free bytes on a track signals an invalid free map. Why waste a byte? :-) - A directory, in addition to its 4-byte header, contains 4 bytes of timestamp, 4 bytes of protection, 2 bytes of parent inode (a magic value for the root directory), 1 byte of name length, 1 byte of comment length, strlen(name) + strlen(comment) bytes of name and comment, and 3 bytes per entry. The number of entries is computed from the length of the extent stored in the header. An entry is 2 bytes of inode number and 1 byte of hash value. The hash function hasn't been decided yet. The fact that all directory entries are pointed to directly by the directory means that I can sort all the inodes by track when doing a directory listing, and preferentially allocate new directory entries on tracks others are already on. - A file is the same, except the entries are pointers to extents, and are 1 byte of track # and 2 bytes of length. Files and directories are told apart by the high bit of the name length field, since I don't, to be honest, see any crying need for file names more than 127 bytes long. - If, by some perverse combination of allocation and deallocation, a file has multiple extents on a single track, they are distinguished by being stored in order, so if extent #15 and #100 of a file are both on track 12, then extent #15 will be stored first. Extents don't have associated sequence numbers. A file's inode is conceptually extent -1 for this purpose. Note that inodes can be recognised from just the header, despite the lack of explicit flags, since the inode number in the header shows the inode is on this track, and the inode is the first extent of all those on this track labeled with that inode number. So, having said all that, we can compute the space taken up by the formatting of a disk. 1K of boot blocks, 4 bytes of free list header, 330 bytes of free list, 4 bytes of root directory header, and 16 bytes of root directory, plus the length of the name of the disk. Assuming a 1-character name, that's 355 bytes of overhead, 1379 including the boot blocks. AmigaDOG has 1 block of root directory and 1 block of bitmap, for 1K of overhead, 2K with the boot blocks. Another figure is the overhead associated with a small file (one that's stored in one extent, almost a certainty for files up to a few K). 3 bytes of entry in the parent directory, 4 bytes of inode header, 16 bytes of inode, assume 1 byte of name, 3 bytes of extent pointer, and 4 bytes of data extent header, for a total of 31 bytes of overhead. AmigaDOS would have a block (512 bytes) of file header, plus the 24 bytes of data in the data block, plus the internal fragmentation (256 bytes average). The maximum single file that will fit on a disk has 160 extents, on on each track, with 7 bytes of overhead per extent, plus the 24 bytes of file overhead computed above, for 1120 bytes of overhead. AmigaDOG would have 1 block of file header per 72 blocks of data, 25 blocks (12800 bytes) total, plus the block header information. In other words, the largest single file you could fit on a floppy would have a total of 1379+1120 = 2499 bytes of file-system overhead. At 880K (= 901120 bytes), that's 898621 bytes, 877.56K. However, why restrict ourselves to 880K per floppy? Each sector also has 16 bytes of "OS recovery info", 27.5K total on the floppy. It's trivial to use this area, as well. This gives us an extra 28160 bytes, for 926781 bytes maximum file size, 905.06K. Mac enthusiasts don't seem to notice the difference between 800K and 880K (heck, it's hard to *see* with slashed zeros). Do you think they'll notice the difference between 800K (less file-system overhead) and 900K (*including* file system overhead)? Okay, so I'm bragging... Now, the reason I posted all this is to find someone who'll tell me I'm doing something stupid. The main things I'm wondering about are: - Is it okay to use the "OS recovery info" areas like this, and - Is it okay to use the 16-byte header areas of the boot blocks? The former, I think is okay, even though I'm storing user data there, not quite in accordance with the original plans. The latter, I'm not so sure about. It's only 32 bytes, but it's a trifle messier in the logic not to use them. Other points: - Even though I can get something like 36,000 empty files on a floppy, I can't get more than 1929 entries in a single directory (i.e. more files than will fit on a whole floppy under the current AmigaDOG). Does anyone think there's something repugnant about this? - Can anyone suggest a good hash function? Otherwise, I'll just hunt through Knuth and pick one that strikes my fancy. You want to turn an arbitrary-length filename into 8 random bits. Since this must be case- insensitive and filenames are mostly letters, just masking off the 32 bit is probably easiest and doesn't cost any efficiency, but if you have something ingenious in mind, go ahead. Comments, anyone? -- -Colin (uunet!microsoft!w-colinp)