Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!henry
From: henry@utzoo.UUCP (Henry Spencer)
Newsgroups: net.space
Subject: Re: Chicago Tribune Article (long flame)
Message-ID: <4100@utzoo.UUCP>
Date: Mon, 16-Jul-84 21:59:54 EDT
Article-I.D.: utzoo.4100
Posted: Mon Jul 16 21:59:54 1984
Date-Received: Mon, 16-Jul-84 21:59:54 EDT
References: <501@denelcor.UUCP>, <406@ames.UUCP>
Organization: U of Toronto Zoology
Lines: 116

> As a recent spate of in flight failures
> has shown, extreme caution is needed in space to make things work.  The 
> margins for error are tiny and the consequences of mistakes in the hundred
> million dollar range or more.  Insurance money for spacecraft is drying up
> and getting very expensive due to failures by PAM-D, Ariane and other upper
> stages.  NASA is extremely careful because that is what it takes to make
> spacecraft work.  Even the vast documentation requirements failed to note
> a critical pin on Solar-Max that almost caused the mission to fail.  The
> paper work could be replaced by computer work at lower cost and greater
> reliability, but leaving out the tests and documentation is asking for
> megabucks down the tubes.

As several projects have demonstrated, vast documentation systems are *not*
necessary for the (rare) projects that are run *right*.

A good example of this is the SR-71 Blackbird.  It's still the world's
fastest aircraft (if you don't count the Shuttle's brief reentry), and
25 years ago it was a formidable challenge.  New ground had to be broken
in a dozen areas, including metallurgy.  [I mention this because tracking
every last piece of metal is one of the reasons frequently advanced for
needing bales of paper for everything.]  Nevertheless, it got by with
several orders of magnitude less documentation than "ordinary" aircraft
projects needed, even then.  "Do not confuse effort with work."

The basic problem with the space business right now is not the lack of
still-more-detailed documentation.  It is the "everything is required to
work right the first time" attitude.  Now, don't get me wrong.  There is
nothing wrong with "we will do our best to make sure it works the first
time"; it's definitely the only way to go.  The problem is when you
start insisting that failures are not just undesirable, but unacceptable.
This means that it is impossible to do meaningful experiments, because
they might fail.  *OF COURSE* it is expensive to build, say, a Space
Shuttle, when the roof falls in if the tiniest thing goes wrong.  How
many aircraft are required to be perfect after only a handful of test
flights?  Yet the Shuttle program not only organized things this way,
it based the whole viability of the program on the notion that the
Shuttle would be fully operational almost instantly.  This is madness,
and awesomely expensive madness too.

Even in military aircraft programs, not noted for being well-managed,
it's common for the first dozen aircraft to be allocated solely to test
work, with no expectation that they will ever be useful otherwise.
Where are the test shuttles?  Please don't tell me that the orbiters
are too expensive to be used this way; this is known as "painting
yourself into a corner", and does not connote good design to me.  In
retrospect, it is clear that the Shuttle was too ambitious a project
trying to meet too many needs simultaneously.  The US would be much better
off with a large fleet of much smaller reusable spacecraft, plus big
expendable boosters for heavy-lift work.  Oh, true, the heavy-lift jobs
ought to be done with reusables, too -- EVENTUALLY.  But one must learn
to crawl before one can walk, and NASA is now paying the price for trying
to take shortcuts.  "Of course it'll work."  Sure.

Of course "extreme caution is needed... to make things work", of course
"margins for error are tiny", of course the consequences of mistakes are
severe -- because the whole system is organized on the assumption that
mistakes will never happen!  The margins for error should never have been
allowed to get that small, because Murphy's Law really does apply here,
as everywhere else.  "Even the vast documentation ... failed to note a
critical pin on Solar-Max that almost caused the mission to fail...",
and as we all know, it's a good thing for the Shuttle's credibility
that the Solar Max repair worked.  This sort of cliffhanger should not
be allowed to happen.  It's a travesty to design a spacecraft to be
repaired in-orbit by the shuttle and then forget to include an emergency
de-spin system, which would permit the thing to be despun for repair
in the presence of attitude-control failure.  It's ridiculous to set up
a repair mission which cannot adapt to the smallest problem.  My
understanding was that the docking failure was because of a spike of
fiberglass sticking up; why didn't the astronaut have clippers on hand
for coping with such things?  (Yes, I know, because the spacesuits
are too clumsy for such fine work in tricky conditions... please don't
set me off about the wretched misdesign of current spacesuits...)  It's
a credit to the cleverness of the astronauts and the people on the ground
that they managed successful completion of such a zero-defects mission
after the inevitable defects showed up.

I hope the rescue mission for the PAMmed satellites is indeed mounted.
It would be another small step towards a system that is somewhat tolerant
of unexpected difficulties.  Unless, of course, the mission is a failure
because NASA, once again, assumes that the plan is perfect and nothing
will go wrong...

I realize that I am, to some degree, slandering NASA unfairly.  They do
put a lot of attention into contingency plans and such.  But this is all
to meet *expected* troubles; building in enough flexibility to meet the
*unexpected* problems is a subtly different thing.  Sometimes NASA
pulls this off, sometimes not.  It was fortunate for the Apollo 13 crew
that some smart people insisted on making the LM computer identical to
the CM one, rather than specializing it for the lunar landing only.  It
was a potentially-disastrous inconvenience to them that nobody thought to
apply the same philosophy to the lithium-hydroxide air-purifier cartridges;
fortunately they managed to improvise around that one.

This same phenomenon has been noted in other contexts, notably military
aircraft projects:  lots of attention to known problem areas, but a firm
subconscious assumption that everything else will work, because it's
required to.  The only real solution to this is a firm emphasis on getting
real working hardware -- not computerized guesswork and theoretical
pontifications -- going *early*, so that the inevitable mistakes can be
found and fixed.  Testing must be thorough, and must be done on whole
systems, not just components!  The tests, and preferably the operational
service thereafter, must not be structured on the assumption that there
will be no failures:  failure-tolerance must be built into the plans,
not just the hardware.  Note that this implies designing the whole system
so that a single failure is neither disastrous nor astronomically expensive.
(I don't even want to *think* about the results of a Shuttle crashing.)
Everyone, especially Congress and the media, should be clearly told that
trouble is expected and is not cause for panic.  ["You say your program
still needs debugging, because you didn't write it correctly the very
first time?  Unacceptable.  You're fired."]

I know, it's easier said than done.  Especially for a US government
bureaucracy.  Best argument I've heard yet for private industry in space...
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry