Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!seismo!ll-xn!ames!pioneer!eugene
From: eugene@pioneer.arpa (Eugene Miya N.)
Newsgroups: comp.arch
Subject: Re: Disk Striping (description and references) plus class brief
Message-ID: <2432@ames.arpa>
Date: Mon, 3-Aug-87 17:08:09 EDT
Article-I.D.: ames.2432
Posted: Mon Aug  3 17:08:09 1987
Date-Received: Tue, 4-Aug-87 05:06:25 EDT
Sender: usenet@ames.arpa
Reply-To: eugene@pioneer.UUCP (Eugene Miya N.)
Organization: NASA Ames Research Center, Moffett Field, Calif.
Lines: 125

This is a follow up (I got lots of letters, so I hope interest can be
stirred and more work done in this area (striping)).  I have changed
the order of Chuck's questions to do the simpler first.

>OK.  I'll bite.  And what classes are defined and what do they mean?

Supercomputers purchased by the US. Govt. were (are) rated for their
performance).  The rating is informal and unofficial (emphasis) done
for procurement purposes.  The work is done by the Dept. of Energy
(prior to that ERDA and prior to that the AEC).  The rating is arbitrary
and does not involve any official measurement tool.  What I say is my
understanding of how the rating works.

The rating was developed to Sid Fernbach and George Michael which the
two were are Lawrence Livermore Lab (before they became LLNL).  I have
seen charts on the wall at LLNL which detail some of this.
Supercomputers come in 6 "classes."  Each class should be a factor of 4
to 16 more "powerful" than the preceding class depending on who you talk
to.  Classes are defined more by existing general-purpose machines which
sit in a class.  A Class 6 machine is something of the power of a Cray 1
or Cyber 205, or any of the Japanese Machines.  Class 5 computers
included the ILLIAC IV, CDC 7600, class 4: CDC 6600, IBM 370/195.
There are discrepencies: the ILLIAC had an I/O bandwidth higher than
the Cray could ever have in the near term future.

Classes came about for the same reasons Berkeley Unix from 1.0, 2.0,
3.0, and 4.0 BSD: lawyers when ever new agreements or rules had to be
written, a new class or Distribution have to be negotiated. (e.g. what
would an agreement for a 5.0BSD look like?...shutter!)  Now: frequent
question: where is `my' machine (typically an Apple, VAX or SUN).  These
machines don't rate.  The definition of a supercomputer is relative, so
at any given time, those give machines don't rate, and a class is
closed.  Sid said a VAX could be a "Class 1/2."

Problems with classes: the most obvious problem is handling parallelism.
The MPP and the Connection machine are good cases which don't fit this
rating scheme.  This includes problems like I/O.  The new database
machines should also make some of this rating interesting

Personal note: when I first saw classes I was reminded rating climbs
(rock, etc.) which had early versions running from 1 to 6 (or I to VI)
[why not 1 to 5 or 1 to 10].  This is got me curious and I eventually
met George and Sid.  Also I know that lots of DOE/ex-AEC people are or
were climbers like E. Teller.  What is interesting is that climbing is
going thru a similar problem with their closed ended rating system
(breaking into 7s).  Sorry for the digression.  This is the second time
I have described class to the arch group.

>What is "disk stripping"?
>-- Chuck

Oh, you caught my typo!  Disk stripping is the process of cleaning the
surface of a platter before the magnetic material is deposited ;-).

I meant to say DISK STRIPING.  This is the distribution of data across
multiple "spindles" in order to 1) increase total bandwidth, 2) for
reasons of fault-tolerance (like Tandems), 3) other miscellaneous
reasons.

Very little work has been done on the subject yet a fair number of
companies have implemented it: Cray, CDC/ETA, the Japanese
manufacturers, Convex, and Pyramid (so I am informed), and I think
Tandem, etc.  Now for important perspective: It seems that striping over
3-4 disks like in a personal computer is a marginal proposition.
Striping over 40 disks, now there is some use.  The break even-point is
probably between 8-16 disks (excepting the fault tolerance case).  A
person I know at Amdahl boiled the problem down to 3600 RPM running on 60
HZ wall clock: mechanical bottlenecks of getting data into and out of a
CPU from a disk.  The work is not glamourous as making CPUs, yet is just
as difficult (consider the possibility of losing just one spindle).

The two most cited papers I have seen are:

%A Kenneth Salem
%A Hector Garcia-Molina
%T Disk Striping
%R TR 332
%I EE CS, Princeton Univerity
%C Princeton, NJ
%D December 1984

%A Miron Livny
%A Setrag Khoshafian
%A Haran Boral
%T Multi-Disk Management Algorithms
%R DB-146-85
%I MCC
%C Austin, TX
%D 1985

Both of these are pretty good reports, but more work needs to be done in
this area, hopefully, one or two readers might seriously.  The issue is
not simply one of sequentially writing bits out to sequentially lined
disks.  I just received:

%A Michelle Y. Kim
%A Asser N. Tantawi
%T Asynchonous Disk Interleaving
%R RC 12496 (#56190)
%I IBM TJ Watson Research Center
%C Yorktown Heights, NY
%D Feb. 1987

This looks good, but what is interesting it that it does not cite either
of the two above reports, but quite a few others (RP^3 and Ultracomputer
based).

Kim's PhD disseration is on synchronous disk interleaving and she has a
paper on IEEE TOC.

Another paper I have is Arvin Park's paper on IOStone, an IO benchmark.
Park is also at Princeton under Garcia-Molina (massive memory VAXen).
I have other papers, but these are the major ones, just starting
thinking Terabytes and Terabytes.  From a badge I got at ACM/SIGGRAPH:

	Disk Space: The Final Frontier

From the Rock of Ages Home for Retired Hackers:

--eugene miya
  NASA Ames Research Center
  eugene@ames-aurora.ARPA
  "You trust the `reply' command with all those different mailers out there?"
  "Send mail, avoid follow-ups.  If enough, I'll summarize."
  {hplabs,hao,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene