Path: utzoo!mnetor!uunet!husc6!rutgers!mtune!codas!killer!usl!usl-pc!jpdres10 From: jpdres10@usl-pc.UUCP (Green Eric Lee) Newsgroups: comp.os.misc Subject: Re: Contiguous files; extent based file systems Message-ID: <517@usl-pc.UUCP> Date: 20 Dec 87 02:14:00 GMT References: <561@amethyst.ma.arizona.edu> <3228@tut.cis.ohio-state.edu> <177@cullsj.UUCP> <1931@rti.UUCP> Distribution: na Organization: Univ. of Southwestern La., Lafayette Lines: 71 Summary: Good opportunity for benchmarking Keywords: Expires: Sender: Reply-To: Followup-To: In message <1931@rti.UUCP>, trt@rti.UUCP (Thomas Truscott) says: >In article <177@cullsj.UUCP>, gupta@cullsj.UUCP (Yogesh Gupta) writes: >> ... . However, I find that if I create a 100MB file >> under Unix (BSD 4.2, System V Rel 1), the overhead in >> randomly accessing various parts of it is too high (due to indirect >> inode structures). Any comments? >Sequential I/O certainly seems linear with file size. Yes, because of block caching knarfing the indirect blocks. >I think random access would be the same. Not necessarily. You may need to re-read indirect inodes, or other things of that sort. >The larger file needs ~ceil(20 000 000/(8192/4)) == 2 indirect blocks, >and any reasonable I/O system will keep both in the buffer >cache at all times. BSD, perhaps. I'm at home, so I don't have access to the BSD FFS documentation. But, let's see what Sys V does (quick, flip to back of Bach Book): Well, most Sys V systems that I've seen have 1K logical blocks (physical blocks may be larger). A 100mb file would thus have 100,000 logical blocks. Single indirect would take us up to 266 blocks. Double indirect would take us up to 65802 blocks. We'd be well into the triple-indirect. Now, for sequential access, it doesn't matter, because those triple-indirect blocks will be stored in the block cache. But for random access... consider that we have 390 blocks full of indexes to those 100,000 blocks, not even counting the indirects... and considering that accessing those 100,000 blocks is likely to displace the earliest-accessed of those 390 blocks of indexes from the block cache (because the data blocks will be cached, of course -- the disk algorithms don't know the difference between blocks full of block numbers, and blocks full of data), thus eliminating any cacheing effects... you may average 2-3 disk hits in order to access a block of data, if you access the data randomly enough (note that 4 disk hits is worst-case, first-time you hit a tail block... but it's quite likely that the first of those indirect blocks, at least, is going to stick around in cache for awhile). In other words, a large scale random access program quite possibly will run at half speed. I'd like it, too, if someone benchmarked this (we don't happen to have any file systems handy with 100mb free, or I'd do it myself :-). >Rather than making files contiguous I would prefer larger blocksizes. >For example blocksize == tracksize lets a smart disk controller >eliminate rotational delay. That works only when you have a file system like the BSD FFS which can allocate fragments of blocks. Otherwise, most of the space is wasted. Also note that it makes the job of the fragment allocator much more... fragmentary. Contigious files can greatly speed things, in conjunction with things such as track buffers. For example, allocating the icons contigiously can more than double the speed of pulling up icons under AmigaDOS/Intuition. Unfortunately, contigious allocation greatly increases the complexity of the file system, much more than BSD's fragment system (since there's a fixed number of 256 byte fragments in a 4k-8k block). I'd have to think about it for quite awhile to come up with a contigious allocation scheme that worked, that was transparant to applications, and, most importantly, had adequate speed on currently-existing machines. History is filled with machines that died because of elegant algorithms that sucked up more CPU time than they were worth (an example of such an algorithm is the Multics process scheduling algorithm, which took a dozen factors including phase-of-moon into account when trying to decide what process to run next ;-). -- Eric Green elg@usl.CSNET P.O. Box 92191, Lafayette, LA 70509 {ihnp4,cbosgd}!killer!elg, {ut-sally,killer}!usl!elg "what exactly is a dream... and what exactly is a joke?" -- Syd Barrett