Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!yetti!unicus!craig
From: craig@unicus.UUCP
Newsgroups: comp.os.misc,comp.unix.wizards
Subject: Re: A Shared Libraries Solution
Message-ID: <1136@unicus.UUCP>
Date: Sun, 18-Oct-87 22:21:32 EDT
Article-I.D.: unicus.1136
Posted: Sun Oct 18 22:21:32 1987
Date-Received: Mon, 19-Oct-87 23:23:29 EDT
Reply-To: craig@unicus.UUCP (Craig D. Hubley)
Organization: Unicus Software Inc., Toronto, Ont.
Lines: 151
Xref: yetti comp.os.misc:285 comp.unix.wizards:4605

Steve Nuchia writes:
>In article <1057@unicus.UUCP>, craig@unicus.UUCP (Craig D. Hubley) writes:
>> One effective way to deal with revisions to shared libraries is by maintaining
>> several versions around, and have PROGRAMS know which revision levels they 
>> can count on to perform code.
>> ...
>> If a program breaks inside or `on the border' of a library routine,
>> ...
>> program will try the new one, fail, back up to the error (if possible),
>> or restart if not, and ask you to make the old one available.
>
>Does this not beg the question of how the program _detects_ the failure?

Anything more complex than a simple error-message generating failure would not
be reliably detected.  That's why humans must stay in the loop.  The detection
is simply another process to run when application-level errors occur.  This is
just another error vector that essentially adds the information `version 9.4
doesn't work'.  Users could observe errors and generate this information from
within the application, with a little work, but then the problem of WHICH
shared library was responsible (when several are in use) becomes an issue.
Most automatic bug-loggers already do as much, placing the versions of the
shared libraries in use, and the version of the application code, in the 
bug-log header before the user describes the problem.  This gets sent to 
the wizards, who can then solve the halting problem themselves. :-)

>I think I understand you to be saying that your approach allow the system
>to run in the presence of a new, untested library?  How does this differ
>(in the light of the sequel) from the "old way" ?

Because it attempts to find the *known* working version first.  Only if 
this is unavailable does it try the new ones.  Eventually, one would assume
that they would be removed from the system.  This supposes multiple versions.
If simple failure is desired when the preferred `known' library is unavailable
that can be an option (perhaps a `response' field in the program that
holds `fail' `search for known version' `try new version' or `notify sysop').
Optomism or pessimism is then decided, application by application, at
compile or install time.  

>> 	Programs can find problems and log them, notifying the user,
>> 	or users can find problems.  In either case, the `buggy' revision
>> 	will no longer be used by that program.  Or at least, that copy
>> 	of that program.  An alternative would be to have the library store
>> 	the failed-program data, but that would impose a burden.
>
>Exactly how are programs to do this?  Is this not a close relative to
>the halting problem?  I've heard that the ESS5 control program was

The bulk of the detection is done by users.  Only the reporting is automated,
and the prevention of further errors.  I didn't mean to imply that the program
was to find problems with its own operation.  Rather I meant that the OS
facilities for error detections would find problems by running the program
and seeing it fail in an obvious way.

>> 	It is being effectively employed, at least partially, in XNS, and 
>> 	I believe that a similar, though less straightforward, system is
>> 	used in IBM mainframes.
>
>Perhaps I misunderstand you.  Do these operational systems employ human
>intervention in the error detection loop?

Yes.  XNS doesn't have automatic `try-new-library', so far as I know,
though there was talk about it at one point, I think.  Programmers actually
have to modify the Courier programs that use resources over the Ethernet.
I could be wrong about this.  Perhaps there's an XNS wizard out there ?

>> 	Users could become `spoiled' enough to count on the system to find
>> 	incompatibilities, and fail to look for data errors themselves.
>
>Naive users are a problem in many areas, password security being one of the
>most well known, with inadequate failure reporting running a close second.

One way to aid this is to have all `new library tries' send some sort of 
output report to the user's mailbox, telling him to doublecheck the output
just in case some weirdness has occured.  Inform him that HE IS RESPONSIBLE
and that PROBLEMS CAN OCCUR.  Conscience can usually take care of the rest.
Unless there are a LOT of such updates.

>> 	Shared libraries would have to be checked, on open, for compatibility.
>
>If by this you mean comparing them against the stored list of compatibilities,
>I had understood this to be a part of the overhead of that scheme.  Do you
>have something else in mind?  Perhaps you allude to the "testing" of the
>library on first encounter?

Yes and Yes.  It's all overhead, in any case.  The "testing" overhead exists
in some sense or other, even if it's simply the user eyeballing his output
to make sure that his new floating-point library didn't float into deep space.

>> Considering some of these are problems already extant in the existing
>> bug-spotting procedures, and the worst thing that gets added is a little
>> extra data and a few more cycles to open libraries, it seems pro overall.
>
>Actually, assuming I properly understand you, the user complaceny is
>probably the worst that gets added.  Especially if this extends to
>the software engineering folks, who _should_ be testing things and
>not relying on a mathematically unsound (isomorphic with the halting
>problem) problem detection and logging scheme.

That's why user notification is important.  I should have mentioned it before.
The first thing to remember is that MOST PEOPLE DON'T KNOW THAT BUGS MOVE,
and that new versions of buggy code are often just buggy in a different way.
Even if it worked before.  I didn't mean to imply that the machine hides this
data from the user.  Only those parts of the testing process that it performs.

>I think the system you advocate, call it "optimistic but reactionary",

Good name.  But it could also be made pessimistic with the options I mention.
The reactionary component is necessary to minimize that worst of all bugs,
the long-undetected data error.  Once I heard that CNCP Telecommunications
lost $90 million over ten years of using a bad formula to calculate rates.
The error ?  Placing a +1 under, rather than outside, a square root sign.
Where there's one bug, there's usually many.  Let the programmers root `em out.

>is a useful addition to the family of library sharing algorithms.  It
>should not be expected to work miracles, and indeed should be seen as
>a way of integrating _user_ problem reporting into the library ungrade
>cycle rather than eliminating human testing.

Perhaps an appropriate solution is to have applications know which algorithm
they are to use (since we've already stored failure data).  Provide `em all.
OS code uses ONLY what it is told, quick hacks use anything. Caveat programmer.

Nothing can eliminate human testing.  But users, after all, are the only
persons with a vested interest in spotting ALL types of problems. Their
reports are, in a sense, the only meaningful ones.  I think it is lack of
familiarity with bug-reporting procedures, and the necessity of initiating 
the operation themselves, or perhaps recording the data until they can,
that causes things to go unreported.

>> This has been an interesting debate.  Keep it up.
>I concur.

Is the posting to comp.os.misc and comp.unix.wizards appropriate?
I keep wondering if it doesn't belong elsewhere, but I can't think
of anywhere. 
 
>>Steve Nuchia	    | [...] but the machine would probably be allowed no mercy.
>>uunet!nuchat!steve  | In other words then, if a machine is expected to be
>>(713) 334 6720	    | infallible, it cannot be intelligent.  - Alan Turing, 1947

*Craig's Corollary* to the Turing Test:
"Nobody will believe it's intelligent until it lies just to cover its ass."
Quote liberally.  No applause, just throw money.  

Hope this clears up the ambiguities, Steve.  Thanks for the opportunity
to think about these things in a more directed fashion.  I appreciate it.

	Craig Hubley, Unicus Corporation, Toronto, Ont.
	craig@Unicus.COM				(optomistic, Internet)
	{uunet!mnetor, utzoo!utcsri}!unicus!craig	(pessimistic,dumb uucp)
	mnetor!unicus!craig@uunet.uu.net		(pessimistic,dumb arpa)