Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cs.utexas.edu!tut.cis.ohio-state.edu!ucbvax!decwrl!shelby!eos!amelia!sun217!truesdel
From: truesdel@sun217..nas.nasa.gov (David A. Truesdell)
Newsgroups: comp.unix.wizards
Subject: Re: bigger longs (64 bits)
Message-ID: <4882@amelia.nas.nasa.gov>
Date: 12 Feb 90 22:34:25 GMT
References: <11071@encore.Encore.COM> <4812@amelia.nas.nasa.gov> <11083@encore.Encore.COM> <605@bbxsda.UUCP> <4849@amelia.nas.nasa.gov> <17902@rpp386.cactus.org>
Sender: news@amelia.nas.nasa.gov
Lines: 75

jfh@rpp386.cactus.org (John F. Haugh II) writes:
>In article <4849@amelia.nas.nasa.gov> truesdel@sun217..nas.nasa.gov (David A. Truesdell) writes:
>>A striped (or stripeing) filesystem is one in which the filesystem is spread
>>out over a set of disks in order to increase capacity and/or performance and/or
>>reliability.

>I think you've described three different types of file system schemes.

No, there are a lot of different filesystem schemes which can display these
same attributes (capacity, performance, reliability) to differing degrees.

>Striping, from what I've seen, refers to laying consecutive cylinders out
>on consecutive drives so that a seek on one drive can occur at the same
>time as the transfer on the next drive, thus, seeks are free for sequential
>reads.

Another variation can place consecutive blocks on drives with different data
paths which can increase the I/O transfer rate above that of an individual
drive (or data path).  Seeks would be concurrent, too.

>Another strategy is mirroring, which puts redundant copies of the data
>on one or more drives [ usually more than one ] to increase the realiability
>of the data.  A drive system with two 50,000Hr MTBF drives mirroring each
>other would have a MTBF of decades or centuries instead of years.  A failed
>drive could be powered down and replaced without the need to re-boot the
>entire system, provided the hardware permitted drive replacement with the
>power on.

A "shadowed", or "mirrored", filesystem is very reliable, however, for a large
site this can become quite expensive.  Imagine having to buy twice (or more)
the amount of disk in order to hold all your data.  Other variations of RAID
filesystems (a mirror disk is classed as a "Level 1" RAID) can employ error
correction techniques to obtain more than adequate reliability, without wasting
50% of your disk capacity.  In addition, a mirrored filesystem won't help your
I/O throughput.

The equation below shows how to calculate the effective MTBF for a multi-disk
filesystem.  The variables are: the MTBF of a disk (MTBFdisk), the mean time to
repair for a disk (MTTRdisk), the number of data disks (#data) and the number
of disks with redundant data (#ecc).

                     ( MTBFdisk ) ^ 2
    MTBFfs = --------------------------------
              #data(#data + #ecc) * MTTRdisk

>The simplest reason to use more than one drive is to create a filesystem
>larger than any of the single drives involved.  I've seen this refered to as
>"spanning".  The beginning of one drive is the logical end of the previous
>drive.  Thus, two 250MB drives could be combined to make a single 500MB
>logical drive, and so one.

However, this simple approach is not without its own risks.  If redundant
information is not kept, the equation above degenerates into:

              MTBFdisk
    MTBFfs = ----------
               #data

So if you use your 50,000 hour MTBF disks, your filesystem ends up with a MTBF
of 25,000 hours.  And the more disks you add, the worse it gets.

Try working out the numbers for yourself.  Consider a filesystem which you want
to span 11 disks.  A striped filesystem, with a single ecc disk, would require
a total of 12 drives.  Using 50000 hours as the MTBF, and 10 hours for the time
to repair, you get a mean time between failure for the filesystem of 1,893,939
hours (or 216 years).  A mirrored filesystem (spanning the disks) of the same
capacity would require a total of 22 drives, and would have a MTBF of 1,033,057
hours (or 117 years).  For the worst case, a simple "spanned" filesystem would
require only 11 disks, but would have a MTBF of 4,545 hours, or 189 DAYS.

T.T.F.N.,
dave truesdell (truesdel@prandtl.nas.nasa.gov)

"Testing can show the presense of bugs, but not their absence." -- Dijkstra
"Each new user of a new system uncovers a new class of bugs." -- Kernighan