Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!husc6!bbn!oberon!bloom-beacon!mit-eddie!rutgers!bellcore!faline!sabre!gamma!ulysses!sfmag!sfsup!shap
From: shap@sfsup.UUCP (J.S.Shapiro)
Newsgroups: comp.arch,comp.os.misc
Subject: Re: Shared libraries (Was: Re: Big Programs Hurt Performance)
Message-ID: <2114@sfsup.UUCP>
Date: Sat, 26-Sep-87 16:23:46 EDT
Article-I.D.: sfsup.2114
Posted: Sat Sep 26 16:23:46 1987
Date-Received: Wed, 30-Sep-87 07:26:01 EDT
References: <6886@eddie.MIT.EDU) <2501@xanth.UUCP> <2067@sfsup.UUCP> <443@devvax.JPL.NASA.GOV>
Organization: AT&T-IS, Summit N.J. USA
Lines: 103
Summary: Answer to why one wants unshared libraries
Xref: mnetor comp.arch:2417 comp.os.misc:251

In article <443@devvax.JPL.NASA.GOV>, des@jplpro.JPL.NASA.GOV (David Smyth) writes:
> In article <28957@sun.uucp> guy%gorodish@Sun.COM (Guy Harris) writes:
> 
> How is COPYING the old shared libraries into executables which need 
> them ANY savings in disk usage?  It seems it will be a DEAD LOSS:
> core (bigger executable images); virtual memory (it gets used up even
> if paged out);  AND disk space (the executable file gets bigger for EVERY
> program which needs the unshared library).
> 

I think you missed the idea. A shared library is usually not a single
monolithic object, and the incompatibility with an old version is usually
temporary. Since the libraries only provide for *unresolved* references, it
suffices as a temporary fix to haul only the problematic object out of the
old library for inclusion in your code, and continue to use the new shared
library. That is, you don't have to use *all* of the old library.

Two things mitigate this. First, changes in libraries are almost always
bug fixes or compatible with the documentation. If you depend on a bug,
you really *do* deserve what you get, particularly given that most
companies that produce compilation systems provide workaround lists.
If you have not been following the docs, you also deserve what you get.

The other possible case is a major upheaval in the compilation system, as
will tend to happen with the forthcoming batch of ANSI C compilers in the
market. ANSI has changed C a lot. In these cases you need to do substantial
rework anyway, and linking with the old objects is a way to get a working
interim product to your customers while you provide a real solution. Yes,
in the short term it is a lose from the standpoint of space if you have to
do this for a lot of routines, however, disk is cheaper than
nonproductivity, and on a temporary basis most customers won't object.

> Why EVER have unsharable libraries???  

There are many architectures out there which don't support shared libraries
(particularly position independent ones) gracefully. Having a shared
library means reseving a good sized chunk of your address space for each
shared library you anticipate, and it becomes a fairly difficult
administrative problem to parcel out chunks of the address space to your
VARs.

On many architectures, position independent code means a performance hit of
20% (or more), and only recently have advances in hardware technology made
this acceptable. It's a tradeoff. Many architectures can't do shared
libraries at all, and any compilation system that wants to deal with these
architectures *as well as* the newer architectures faces a difficult
problem.

> Why EVER have libraries specifically linked to an executable???

See above, then I'll deal with the specific claims below:

> 	a) If it is an application which makes repeated calls
> 	   to a library, the FIRST invocation may be slower, but
> 	   all following invocations can be VERY CLOSE to the same
> 	   speed [Message/Object Programming, Brad J. Cox, see 
> 	   table 1].

Well, this isn't really a win. There are basically two techniques for
making this hack work. These are: (1) completely relocate the executable
when you load it into core to execute it (2) come up with a backpatching
scheme such that the first time you call a function from any given place,
some intermediate glue examines the CALL statement and backpatches the
*real* function pointer into place.

Option (1) is clearly debatably good - that can be a lot of relocation, and if
your binary is big the relocation takes a long time. Whether or not this is
a good choice depends on how many times you need to fire up the binary,
how big it is, and how much disk space it saves you to use the shared
libraries. It makes doing paging efficiently hard (see below).

Option (2) is very difficult to do on many architectures, requires careful
code generation, and prevents taking advantage of span-dependent
instructions for calls. This has it's own impact, and it is potentially
sizeable.

> 	b) Speed Critical Applications probably want to be vectorized,
> 	   and I would think reducing the competition for core via
> 	   shared libraries would be a BIG win if swapping is reduced
> 	   even a little bit (I don't know much about vectorized
> 	   algorithms, I only work on these archaic Suns, Vaxen, and
> 	   such Von Nueman rubbish :^) ).

Consider that both methods require modifying text pages, and this means
that you have to reserve space for these pages in your paging area.
This prevents you from paging in from the text portion of the original program
file. Shared libraries tend to be small sets of core facilities. Chances
are many more pages will reference them than there are in the shared
library, and this hurts you in swap area. Note that to make this work you
need *writable* shared text, which opens a whole other can of worms.

There is a technique which can be used to avoid all this which is to have
an indirection table and a directory in each library, or a
well-known-globals list, as someone suggested, but this implies a
remarkable performance hit.

In short, it ain't all as easy as it sounds, which is why most compilation
systems still don't support it at all.

And that is why you want non-shared libraries.

Jon Shapiro
AT&T Information Systems