Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!yetti!unicus!craig
From: craig@unicus.UUCP (Craig D. Hubley)
Newsgroups: comp.arch,comp.unix.wizards,comp.os.misc
Subject: A Shared Libraries Solution
Message-ID: <1057@unicus.UUCP>
Date: Tue, 6-Oct-87 18:45:47 EDT
Article-I.D.: unicus.1057
Posted: Tue Oct  6 18:45:47 1987
Date-Received: Sat, 10-Oct-87 03:12:33 EDT
Reply-To: craig@unicus.UUCP (Craig D. Hubley)
Organization: Unicus Software Inc., Toronto, Ont.
Lines: 84
Xref: mnetor comp.arch:2531 comp.unix.wizards:4732 comp.os.misc:273


One effective way to deal with revisions to shared libraries is by maintaining
several versions around, and have PROGRAMS know which revision levels they 
can count on to perform code.

`Services', which are programs such as print or mail services, but could
just as easily be shared libraries (which should almost never be compiled
into code) have a revision level, such as 10.2, where the 10 is a major
revision level, and the .2 signifies changes that are not known to cause
ANY program to break.  Each `service' or library knows what revision levels
it has available or can emulate.

If a program breaks inside or `on the border' of a library routine,
it stores that fact, and the revision level of the library it was using.
Thereafter, it will ask for `Print Service 8.0 - 10.0', and if only 10.2
is available, it will fail with a robust error, perhaps searching elsewhere
on the system for another print server or archived library of old service
routines.  In fact, most services can deal quite handily with such problems,
simply by having backup disk storage that contains the older services, if
a particular site has programs that need them.  If you don't need the older
services, then don't store them.  The worst that will happen is that your
program will try the new one, fail, back up to the error (if possible),
or restart if not, and ask you to make the old one available.
On a microcomputer, this might mean inserting a floppy.  Quite a bit friendler
than weird data errors, hein?  

XNS uses a similar system for services to know whether or not they can 
serve various programs, though I don't know if the program-based revision
tracking is automatic.  It might be by now.

This method is effective because:

	It frees you, unlike "check the interface" from having to debug
	the whole system before getting an actual solution.

	The revision-tracking is automatic.

	Programs assume new services will work until they actually fail.

	Programs can find problems and log them, notifying the user,
	or users can find problems.  In either case, the `buggy' revision
	will no longer be used by that program.  Or at least, that copy
	of that program.  An alternative would be to have the library store
	the failed-program data, but that would impose a burden.

	Unlike "Revision X.Y or greater", such as the Amiga uses, it does
	not assume that upgrades are always robust.  As anyone involved in
	large systems design should know, the NUMBER of bugs remains constant
	above a certain size... they only move around.

	It is being effectively employed, at least partially, in XNS, and 
	I believe that a similar, though less straightforward, system is
	used in IBM mainframes.

Some disadvantages:

	Programs would have to store failed-version information on every
	shared library they use.  This is fairly minimal in terms of size,
	but restarts, and retrys, could use up a fair bit of computing time,
	where libraries change often, or many copies of a program exist.

	The automatic-logging aspect of the system would be subject to bugs.

	Users could become `spoiled' enough to count on the system to find
	incompatibilities, and fail to look for data errors themselves.

	Shared libraries would have to be checked, on open, for compatibility.

Considering some of these are problems already extant in the existing
bug-spotting procedures, and the worst thing that gets added is a little
extra data and a few more cycles to open libraries, it seems pro overall.

Any comments, particularly from those who have used distributed services
under such a system?

Perhaps more importantly from a UNIX point of view, could it be effectively
implemented on today's systems?

This has been an interesting debate.  Keep it up.

	Craig Hubley, Unicus Corporation, Toronto, Ont.
	craig@Unicus.COM				(Internet)
	{uunet!mnetor, utzoo!utcsri}!unicus!craig	(dumb uucp)
	mnetor!unicus!craig@uunet.uu.net		(dumb arpa)