Path: utzoo!attcan!uunet!cs.utexas.edu!wuarchive!usc!apple!portal!cup.portal.com!thad From: thad@cup.portal.com (Thad P Floryan) Newsgroups: comp.unix.questions Subject: Re: unix file structure (or lack of same) Message-ID: <35623@cup.portal.com> Date: 5 Nov 90 13:37:12 GMT References: <125379@linus.mitre.org> Organization: The Portal System (TM) Lines: 62 duncant@mbunix.mitre.org (Thomson) in <125379@linus.mitre.org> writes: I understand that, on unix, the file system is designed so that a file always looks like a sequence of bytes, with no record structure at all. Is this correct? YES, thank goodness! Contrast that UNIX view of a "file" to that on, say, VAX/VMS where you find eleventy-seven RMS file types that complicate efficient and portable I/O beyond belief. I have a commercial product in that market, and I'm now porting it to UNIX, so this is not idle speculation. If so, how does one implement an efficient database manager on unix in a standard, portable, way? To be efficient, a database manager needs to have random access into files on a record-oriented basis. It seems to me that fseek() wouldn't do the job. (Am I wrong here?) If unix doesn'`t provide a record-oriented view of files, then any database implementation would have to go below unix, and access the mass storage devices directly. Is this right? One can impose any "view" on the file one desires. Assuming fixed-length 'records' and no funny-stuff at the beginning of the file, a typical method to calculate any record's relative address in the file could be: address = (record_number - 1) * sizeof(record_structure); and that "address" would be used per "lseek(fd, (long)address, 0);". See the writeup of lseek(2) for the meaning of its 3rd parameter which provides some interesting options. Of course, a real DBMS could be "smarter" and calculate a block address instead, (possibly) map that into memory, and then calculate the record's in-core offset from the beginning of that buffer. If you're going in for really big files whose 'records' might even be variable-length, use a secondary index file(s) whose records are fixed length and "point" to the address of their associated data records in the big file. Common datafile index methods are B-tree and ISAM. And if you're REALLY concerned about efficiency and your OS version permits it, go for either the FFS or a 4K or 8K filesystem which could even be a separate mount and dedicated to DBMS applications. Some "database" vendors have claimed they've written their own filesystems due to perceived problems with UNIX' filesystems, but I haven't seen the need for that even with some of the humongous data files with which I operate. And a custom file system means you're going to need a custom backup-and-restore facility and the attendant special procedures. Many standard filesystems are either 1K, 2K or 4K. This means the smallest allocated space for a given file (ignoring sparse files) would be that size. It also means that for small files you may end up with a lot of "wasted" space at the end of each file. The 1K, for example, means the logical block size comprises two 512-byte real sectors. Stick with the "standard" software and tools for greater portability, and switch to custom methods only if the specific case warrants it. With today's modern UNIX systems and fast I/O subsystems you may be pleasantly surprised. One final comment: you used the word "portable" often. If that is of concern, then you may wish to store your numeric data in ASCII form even though there is a conversion penalty. To move binary data files amongst systems such as a 386/486 and 680x0 and SPARC and MIPS and VAX and ... is asking for trouble, even for integer data. Thad Floryan [ thad@cup.portal.com (OR) ..!sun!portal!cup.portal.com!thad ]