Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!nuchat!steve
From: steve@nuchat.UUCP (Steve Nuchia)
Newsgroups: comp.arch,comp.unix.wizards,comp.os.misc
Subject: Re: A Shared Libraries Solution
Message-ID: <400@nuchat.UUCP>
Date: Thu, 15-Oct-87 19:32:56 EDT
Article-I.D.: nuchat.400
Posted: Thu Oct 15 19:32:56 1987
Date-Received: Sat, 17-Oct-87 10:42:17 EDT
References: <1057@unicus.UUCP>
Organization: Public Access - Houston, Tx
Lines: 137
Summary: please explain further
Xref: mnetor comp.arch:2650 comp.unix.wizards:4936 comp.os.misc:299

In article <1057@unicus.UUCP>, craig@unicus.UUCP (Craig D. Hubley) writes:
> One effective way to deal with revisions to shared libraries is by maintaining
> several versions around, and have PROGRAMS know which revision levels they 
> can count on to perform code.

With you so far...

> If a program breaks inside or `on the border' of a library routine,
> it stores that fact, and the revision level of the library it was using.
> Thereafter, it will ask for `Print Service 8.0 - 10.0', and if only 10.2
> is available, it will fail with a robust error, perhaps searching elsewhere
> on the system for another print server or archived library of old service
> routines.  In fact, most services can deal quite handily with such problems,
> simply by having backup disk storage that contains the older services, if
> a particular site has programs that need them.  If you don't need the older
> services, then don't store them.  The worst that will happen is that your

Still with you...

> program will try the new one, fail, back up to the error (if possible),
> or restart if not, and ask you to make the old one available.

Does this not beg the question of how the program _detects_ the failure?

> On a microcomputer, this might mean inserting a floppy.  Quite a bit friendler
> than weird data errors, hein?  

Jah, if it works.

> XNS uses a similar system for services to know whether or not they can 
> serve various programs, though I don't know if the program-based revision
> tracking is automatic.  It might be by now.

Scarier and scarier.

> This method is effective because:
> 	It frees you, unlike "check the interface" from having to debug
> 	the whole system before getting an actual solution.

I think I understand you to be saying that your approach allow the system
to run in the presence of a new, untested library?  How does this differ
(in the light of the sequel) from the "old way" ?

> 	The revision-tracking is automatic.

True, under the assumptions.  Is this a Good Thing?

> 	Programs assume new services will work until they actually fail.

This is the heart of the matter.  Your proposal is for an optimistic
policy, wheras the traditional approach is pessimistic.  In the pessimistic
approach a program asks for the library(s) it has been tested with, and
someone has to update its idea of what libraries are good manually.  In
your optimistic approach a program would use the latest available library
that had nod been _found_to_be_buggy_ (in a relative sense).

> 	Programs can find problems and log them, notifying the user,
> 	or users can find problems.  In either case, the `buggy' revision
> 	will no longer be used by that program.  Or at least, that copy
> 	of that program.  An alternative would be to have the library store
> 	the failed-program data, but that would impose a burden.

Exactly how are programs to do this?  Is this not a close relative to
the halting problem?  I've heard that the ESS5 control program was
over 75% "audit" code - keeping an eye on the other 25%.  This seems
like an extreme penalty (if my understanding is correct) for not
proving the operative 25%, and illustrates the practical difficulty
of software self-test.

> 	Unlike "Revision X.Y or greater", such as the Amiga uses, it does
> 	not assume that upgrades are always robust.  As anyone involved in
> 	large systems design should know, the NUMBER of bugs remains constant
> 	above a certain size... they only move around.

Agreed, the x.y or greater approach is even more optimistic than yours,
since it makes no explicit provision for buggy (just "old") libraries.

> 	It is being effectively employed, at least partially, in XNS, and 
> 	I believe that a similar, though less straightforward, system is
> 	used in IBM mainframes.

Perhaps I misunderstand you.  Do these operational systems employ human
intervention in the error detection loop?

> Some disadvantages:
> 	Programs would have to store failed-version information on every
> 	shared library they use.  This is fairly minimal in terms of size,
> 	but restarts, and retrys, could use up a fair bit of computing time,
> 	where libraries change often, or many copies of a program exist.

Looks like a proper analysis.

> 	The automatic-logging aspect of the system would be subject to bugs.

True, but such things can be managed easily - any _specific_ library
service can be made robust, the dificulty that brings us to this
discussion lies in making a large, diverse, and ever-changing
collection of services robust in the agregate.

> 	Users could become `spoiled' enough to count on the system to find
> 	incompatibilities, and fail to look for data errors themselves.

Naive users are a problem in many areas, password security being one of the
most well known, with inadequate failure reporting running a close second.

> 	Shared libraries would have to be checked, on open, for compatibility.

If by this you mean comparing them against the stored list of compatibilities,
I had understood this to be a part of the overhead of that scheme.  Do you
have something else in mind?  Perhaps you allude to the "testing" of the
library on first encounter?

> Considering some of these are problems already extant in the existing
> bug-spotting procedures, and the worst thing that gets added is a little
> extra data and a few more cycles to open libraries, it seems pro overall.

Actually, assuming I properly understand you, the user complaceny is
probably the worst that gets added.  Especially if this extends to
the software engineering folks, who _should_ be testing things and
not relying on a mathematically unsound (isomorphic with the halting
problem) problem detection and logging scheme.

> Any comments, particularly from those who have used distributed services
> under such a system?

I think the system you advocate, call it "optimistic but reactionary",
is a useful addition to the family of library sharing algorithms.  It
should not be expected to work miracles, and indeed should be seen as
a way of integrating _user_ problem reporting into the library ungrade
cycle rather than eliminating human testing.

> This has been an interesting debate.  Keep it up.
I concur.
-- 
Steve Nuchia	    | [...] but the machine would probably be allowed no mercy.
uunet!nuchat!steve  | In other words then, if a machine is expected to be
(713) 334 6720	    | infallible, it cannot be intelligent.  - Alan Turing, 1947