Path: utzoo!attcan!uunet!cs.utexas.edu!usc!zaphod.mps.ohio-state.edu!uakari.primate.wisc.edu!aplcen!haven!adm!smoke!gwyn
From: gwyn@smoke.brl.mil (Doug Gwyn)
Newsgroups: comp.unix.questions
Subject: Re: unix file structure (or lack of same)
Keywords: unix, file, database
Message-ID: <14335@smoke.brl.mil>
Date: 5 Nov 90 14:41:49 GMT
References: <125379@linus.mitre.org>
Organization: U.S. Army Ballistic Research Laboratory, APG, MD.
Lines: 38

In article <125379@linus.mitre.org> duncant@mbunix.mitre.org (Thomson) writes:
>I understand that, on unix, the file system is designed so that a file always
>looks like a sequence of bytes, with no record structure at all.

To be more precise, the operating system itself does not impose any record
structure on disk files within the standard hierarchical file system.
Some device types, for example magnetic tape or punched-card reader, might
have their own idea of what constitutes a "record" (normally each such
record would have a length specified by the UNIX write() system call that
provided its data, in the case of magnetic tape, or a particular fixed
length, for a card reader).  Also, the terminal handler under typical
operation collects input from a terminal port up through a new-line and
treats it in many respects as a (variable-length) record, although in this
case partial, kernel-buffered reads are fully supported.

>If so, how does one implement an efficient database manager on unix in
>a standard, portable, way?  To be efficient, a database manager needs to
>have random access into files on a record-oriented basis.  It seems to me
>that fseek() wouldn't do the job.

For normal disk files, applications are responsible for maintaining
whatever structure they wish to use.  Clearly, lseek() is suitable for
getting directly to any known position within the file; if a fixed record
size is assumed, then the arithmetic for the byte offset is trivial.

For variable-sized records, a variety of organizations are possible.
(In fact, this is a big win for the UNIX approach.)  A typical one uses
a separate "index file" with fixed, small record size that points into
a large variable-sized record database file.  B-trees and other structures
are also commonly used.

>If unix doesn't provide a record-oriented view of files, then any database
>implementation would have to go below unix, and access the mass storage
>devices directly.

No, not at all, although a couple of database managers do support that mode
in order to bypass the kernel overhead for the block-buffered inode-based
file system.