Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!henry From: henry@utzoo.UUCP (Henry Spencer) Newsgroups: net.space Subject: Re: Chicago Tribune Article (long flame) Message-ID: <4100@utzoo.UUCP> Date: Mon, 16-Jul-84 21:59:54 EDT Article-I.D.: utzoo.4100 Posted: Mon Jul 16 21:59:54 1984 Date-Received: Mon, 16-Jul-84 21:59:54 EDT References: <501@denelcor.UUCP>, <406@ames.UUCP> Organization: U of Toronto Zoology Lines: 116 > As a recent spate of in flight failures > has shown, extreme caution is needed in space to make things work. The > margins for error are tiny and the consequences of mistakes in the hundred > million dollar range or more. Insurance money for spacecraft is drying up > and getting very expensive due to failures by PAM-D, Ariane and other upper > stages. NASA is extremely careful because that is what it takes to make > spacecraft work. Even the vast documentation requirements failed to note > a critical pin on Solar-Max that almost caused the mission to fail. The > paper work could be replaced by computer work at lower cost and greater > reliability, but leaving out the tests and documentation is asking for > megabucks down the tubes. As several projects have demonstrated, vast documentation systems are *not* necessary for the (rare) projects that are run *right*. A good example of this is the SR-71 Blackbird. It's still the world's fastest aircraft (if you don't count the Shuttle's brief reentry), and 25 years ago it was a formidable challenge. New ground had to be broken in a dozen areas, including metallurgy. [I mention this because tracking every last piece of metal is one of the reasons frequently advanced for needing bales of paper for everything.] Nevertheless, it got by with several orders of magnitude less documentation than "ordinary" aircraft projects needed, even then. "Do not confuse effort with work." The basic problem with the space business right now is not the lack of still-more-detailed documentation. It is the "everything is required to work right the first time" attitude. Now, don't get me wrong. There is nothing wrong with "we will do our best to make sure it works the first time"; it's definitely the only way to go. The problem is when you start insisting that failures are not just undesirable, but unacceptable. This means that it is impossible to do meaningful experiments, because they might fail. *OF COURSE* it is expensive to build, say, a Space Shuttle, when the roof falls in if the tiniest thing goes wrong. How many aircraft are required to be perfect after only a handful of test flights? Yet the Shuttle program not only organized things this way, it based the whole viability of the program on the notion that the Shuttle would be fully operational almost instantly. This is madness, and awesomely expensive madness too. Even in military aircraft programs, not noted for being well-managed, it's common for the first dozen aircraft to be allocated solely to test work, with no expectation that they will ever be useful otherwise. Where are the test shuttles? Please don't tell me that the orbiters are too expensive to be used this way; this is known as "painting yourself into a corner", and does not connote good design to me. In retrospect, it is clear that the Shuttle was too ambitious a project trying to meet too many needs simultaneously. The US would be much better off with a large fleet of much smaller reusable spacecraft, plus big expendable boosters for heavy-lift work. Oh, true, the heavy-lift jobs ought to be done with reusables, too -- EVENTUALLY. But one must learn to crawl before one can walk, and NASA is now paying the price for trying to take shortcuts. "Of course it'll work." Sure. Of course "extreme caution is needed... to make things work", of course "margins for error are tiny", of course the consequences of mistakes are severe -- because the whole system is organized on the assumption that mistakes will never happen! The margins for error should never have been allowed to get that small, because Murphy's Law really does apply here, as everywhere else. "Even the vast documentation ... failed to note a critical pin on Solar-Max that almost caused the mission to fail...", and as we all know, it's a good thing for the Shuttle's credibility that the Solar Max repair worked. This sort of cliffhanger should not be allowed to happen. It's a travesty to design a spacecraft to be repaired in-orbit by the shuttle and then forget to include an emergency de-spin system, which would permit the thing to be despun for repair in the presence of attitude-control failure. It's ridiculous to set up a repair mission which cannot adapt to the smallest problem. My understanding was that the docking failure was because of a spike of fiberglass sticking up; why didn't the astronaut have clippers on hand for coping with such things? (Yes, I know, because the spacesuits are too clumsy for such fine work in tricky conditions... please don't set me off about the wretched misdesign of current spacesuits...) It's a credit to the cleverness of the astronauts and the people on the ground that they managed successful completion of such a zero-defects mission after the inevitable defects showed up. I hope the rescue mission for the PAMmed satellites is indeed mounted. It would be another small step towards a system that is somewhat tolerant of unexpected difficulties. Unless, of course, the mission is a failure because NASA, once again, assumes that the plan is perfect and nothing will go wrong... I realize that I am, to some degree, slandering NASA unfairly. They do put a lot of attention into contingency plans and such. But this is all to meet *expected* troubles; building in enough flexibility to meet the *unexpected* problems is a subtly different thing. Sometimes NASA pulls this off, sometimes not. It was fortunate for the Apollo 13 crew that some smart people insisted on making the LM computer identical to the CM one, rather than specializing it for the lunar landing only. It was a potentially-disastrous inconvenience to them that nobody thought to apply the same philosophy to the lithium-hydroxide air-purifier cartridges; fortunately they managed to improvise around that one. This same phenomenon has been noted in other contexts, notably military aircraft projects: lots of attention to known problem areas, but a firm subconscious assumption that everything else will work, because it's required to. The only real solution to this is a firm emphasis on getting real working hardware -- not computerized guesswork and theoretical pontifications -- going *early*, so that the inevitable mistakes can be found and fixed. Testing must be thorough, and must be done on whole systems, not just components! The tests, and preferably the operational service thereafter, must not be structured on the assumption that there will be no failures: failure-tolerance must be built into the plans, not just the hardware. Note that this implies designing the whole system so that a single failure is neither disastrous nor astronomically expensive. (I don't even want to *think* about the results of a Shuttle crashing.) Everyone, especially Congress and the media, should be clearly told that trouble is expected and is not cause for panic. ["You say your program still needs debugging, because you didn't write it correctly the very first time? Unacceptable. You're fired."] I know, it's easier said than done. Especially for a US government bureaucracy. Best argument I've heard yet for private industry in space... -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry