Path: utzoo!attcan!uunet!usenix!jsq From: jsh@usenix.org (Jeffrey S. Haemer) Newsgroups: comp.std.unix Subject: Standards Update, ANSI X3B11.1: WORM File Systems Message-ID: <525@usenix.ORG> Date: 18 Sep 90 23:37:37 GMT Sender: jsq@usenix.ORG Reply-To: std-unix@uunet.uu.net Organization: USENIX Standards Watchdog Committee Lines: 209 Approved: jsq@usenix.org (Moderator, John Quarterman) X-Submissions: std-unix@uunet.uu.net Submitted-by: jsh@usenix.org (Jeffrey S. Haemer) An Update on UNIX*-Related Standards Activities September 1990 USENIX Standards Watchdog Committee Jeffrey S. Haemer , Report Editor ANSI X3B11.1: WORM File Systems Andrew Hume reports on the July 17-19, 1990. meeting in Murray Hill, NJ: Introduction X3B11.1 is working on a standard for file interchange on write-once media (both sequential and non-sequential (random access)): a portable file system for WORMs. The fifth meeting was held at Murray Hill, NJ on July 17-19, 1990. We adopted a working paper and set to work on a list of issues suggested by the chair. Data Compression Despite the huge capacities of WORM disks, people always want more. Data compression is an easy way to supply more, and on current machine architectures, probably can speed data access by trading CPU cycles for I/O bandwidth. Its main problem is that you need to support more than one algorithm and thus, you need some way to specify algorithms. This is a purely administrative issue, but luckily, it appears that X3 may soon act as a registry for compression algorithms (driven by the need to register compression algorithms for IBM 3840 cartridge tape work in X3B5). (How does this fit in with the rumblings about compress from POSIX.2? I'm not certain. I think part of becoming part of the register means giving up patent rights or allowing liberal licensing, but maybe not. After all, the CD formats are now an ISO standard, but I still think you have to be licensed to make them.) Path Tables and Extended Attributes Path tables were removed from the working paper. We agreed to support hard and symbolic links. The next question was how to handle ``secret'' files: files primarily intended for system use. Examples might include the file describing free space, associated files (like the resource fork of a Macintosh file), and extended attributes (of a Microsoft HPFS file). We agreed that the latter two cases should be handled by regular files that probably are not in the directory tree __________ * UNIXTM is a Registered Trademark of UNIX System Laboratories in the United States and other countries. September 1990 Standards Update ANSI X3B11.1: WORM File Systems - 2 - but are pointed to by the ``inode'' for a file. (Note that this implies there is a way to scan all the files in a volume set without traversing the directory tree(s), analogous to running down the inodes in UNIX.) Given this, we have decided to support extended attributes as a ``secret'' or system file (and probably include pointers to things like resource forks as those attributes). This also gives us an extensible way of handling non-standard or non-essential inode fields. One of the important tasks remaining is to decide which fields are more-or-less mandatory (such as modify time, owner) and which can safely be pushed off into the extended attributes (access control lists, file valid after date). Please send us your suggestions! Space Allocation and Management We agreed that we have to support preallocating space for files, freeing some or all of that space and then reusing that space for other files. After much discussion about extent lists and bit maps, we compromised on a scheme based on extent lists (the details to be worked by the working paper editor). The idea is that is that the free space is described by an extent list (of small but specifiable size) of the ``best'' (probably largest) free spaces, and if this overflows, ``worst'' free spaces are added to a system file representing all the free spaces not in the above extent list. Checksums It was decided that all system data structures would include a 16 bit checksum (CRC-16). We anticipate that most errors would be transient (cabling or memory) and not be media errors. Multi-Volume Sets I had thought the last meeting had settled just about all the questions about multi-volume sets; I was wrong. It took most of a day to agree on these. - You have to have the last volume in order to grok the whole volume set (access any/all of the directories and files). - You can extend volume sets at any time. This and the last item taken together imply the existence of ``terminal'' volumes (which can act as master volumes of a volume set) and ``nonterminal'' volumes (the rest). For example, if I extend a single-volume volume set by two volumes, then volumes 1 and 3 are terminal and volume 2 is not. - You can extract file data from any volume by itself. This is meant only for disaster recovery (I dropped the master volume down the stairwell) and doesn't imply any requirements on September 1990 Standards Update ANSI X3B11.1: WORM File Systems - 3 - directory tree information (much as fsck restores unattached inodes to /lost+found). - Volumes can refer to data (say, extents) on other volumes (both earlier and later volumes). Preallocated space on any volume in a volume set can be returned for future reuse. - The address space of logical blocks for the volume set will be 48 bits; 16 bits for the volume number and 32 bits for the logical block number within a volume. Media can be big (200GB helical scan media exist now) so 32 bits may seem barely big enough, but in such cases you can use a big logical block size. For example, a logical block size of 16KB implies a limit of 64 terabytes per volume; this should be ample for a few years. Defect Management We spent a lot of time on this and learned a lot, but basically put it off to the next meeting. What we mean by ``defect management'' is ``How do we deal with write errors from the file system's point of view?'' (We ignore the disk controller and the device driver, both of which do some unknown amount of more-or-less transparent error management.) We discussed the ``sane'' approach: insert a layer between the file system that handles errors, allowing the file-system code to assume an error-free interface. This apparently good idea is ruled out by slip-sectoring, a (to my mind bogus) technique, which says, ``if writing block n fails, then try subsequent blocks (n+1, n+2, ...) until we succeed.'' Slip-sectoring is mainly used to enhance performance (it does ensure that blocks are more-or-less contiguous), and some disk controllers use it as their error-management technique. (This really screws up your logical address space; it is legitimate for a SCSI disk, your typical error-free, logical-address-space disk interface, to write logical block 5 at physical block 5, then logical block 1 at physical block 4 (1-3 were write errors), then disallow I/O to logical blocks 2,3, and 4 because there is no place to put them - these blocks just vanish!) As preparation for the next meeting, Don Crouse, who deals mainly with high-end machines like Crays and large IBMs, is writing a position paper on performance, and members of the committee, many of whom are drive manufacturers or integrators, are collecting estimates of error rates we have to deal with. (This matters; I see one bad block out of 100,000, but some people have used drives with a bad block in every 100.) The problem is that WORMs have really slow seek times, and when you are pouring a 50MB/s Cray channel at a set of WORMs, you can't afford to spend 1-2 seconds seeking to the bad block area. I personally think we should just do regular bad-block mapping (like most SMD disk drivers) out of a special system file, and people with performance concerns should arrange to have this space spread over the disk. September 1990 Standards Update ANSI X3B11.1: WORM File Systems - 4 - Endian-ness A poll was taken of who really cared which way integer fields were stored; the results were LSB - 1, MSB - 1, Don't Care - 11. It is awkward to specify one of LSB and MSB; this puts half the systems out there at a competitive (performance) disadvantage (though I am skeptical of whether it's significant). Even though we're specifying an interchange standard, the group felt that most interchange would be between systems of the same endian-ness, so we should, somehow, allow native byte order. Accordingly, we agreed that endian-ness will be specified in the volume header (for the whole volume set). In retrospect, I think this was silly; we should have just picked one way. In order that everyone important be evenly disadvantaged, we could have used some byte order like 3-0-1-2 that no one uses. Finale The committee is trying to nail down a firm proposal for balloting. We anticipate a substantial amount of change at the next meeting (Oct 16-18 in Nashua, NH) and have reserved time (Dec 11-13, but no place) for an additional meeting so that we can ballot after the following meeting (Jan 29-31, Bay area). We now have a working paper (available by the end of September or so); I think it likely we can meet this schedule, but who knows. Anyone interested in attending any of the above meetings should contact either the chairman, Ed Beshore (edb@hpgrla.hp.com), or me (andrew@research.att.com, research!andrew, (908)582-6262). I am also soliciting your comments on necessary inode fields and defect management. I will present anything you give me at the next meeting. September 1990 Standards Update ANSI X3B11.1: WORM File Systems Volume-Number: Volume 21, Number 116