Newsgroups: comp.unix.internals
Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!think.com!sdd.hp.com!elroy.jpl.nasa.gov!decwrl!frnkmth!bill
From: bill@franklin.com (bill)
Subject: shared libraries can be done right
Message-ID: <20May91.042630.9136@franklin.com>
Summary: here's how
Organization: Franklin Electronic Publishers
Date: Mon, 20 May 91 04:26:30 GMT

This nonsense that what's-name has been spouting made me decide
to describe how I think shared libraries can be done correctly.
This is basically a dynamically linked shared library scheme with
the linking occuring as each page is read from its file.

Here's how I'd do it. To create a shared library, you first create
executable images that contains shared references only. This is a
two step process. The first step creates an ordinary object file
with undefined symbols. The second step, linking the object with
its shared libraries, associates each undefined with the file it
is contained in, thus converting it to a shared reference. An
object file is executable once the only references it contains
are shared references. Note that this permits mutually
referencing shared libraries if we allow this linking to occur
with the unshared version of a library. In my scheme, "shared
library" is a misnomer; they are really "dynamically linked
executables" or some such buzz; the difference between an
executable and a shared library is only that executables have an
entry point and shared libraries don't.

When a file is exec'ed, its referenced shared libraries are
opened if they haven't been already. This is done recursively so
that a processes knows at the start that all shared libraries
needed are available and don't conflict in addresses. (This isn't
necessary, actually; one could defer this till later, with the
attendant failures at obscure points in random programs.)

Each time that a page is loaded in from a file (at exec time or
at a page fault), any shared references are satisfied in that
page before the page is made available to any process.

Here are the overheads:

Program startup requires making the shared libraries available.
This should be significant only the first time that a shared
library is referenced; all other references should discover
quickly that the shared library is opened. When a shared library
is opened, a special segment is created for its symbol table.
This increases, slightly, the memory needed for using the shared
library.

When an executable or shared library is opened, a segment has to
be created to hold the shared reference information. This also
costs some memory.

When a page is loaded from its file, its shared references must
be resolved. This implies references to the shared reference
information for its file and the symbol tables of referenced
shared libraries. Depending on the implementation, this could
happen each time the page is faulted in, or it could be done only
the first time the page is read from its file and that page could
be then stored in a swap area. In any case, this amounts to a
very simple and fast loop to do the fixups and it only has to be
done occasionally.

Here is the drawback:

Each shared library must exist in a specified region of virtual
memory and this must be decided when the shared library is
created. If one wanted to be clever and avoid this problem, the
shared libraries could also contain relocation information. The
way this would be used is this: a shared library would have a
"preferred" location, one where the library gets placed if there
are no conflicts. When located at this preferred location, no
relocation is done. However, if there is a conflict when an
executable is started, a new set of segments is created for a
relocated version of the shared library, at a system selected
address; these new segments could be used to deal with other
conflicts as well. This would incur an additional overhead, but
only for those processes that reference a relocated shared
library. Also, by opening shared libraries in reverse order of
linkage, system shared libraries generally would never be
relocated, resulting in the cost being borne by users (or
vendors) who create shared libraries and don't take care to avoid
the system's library addresses.

---

As near as I can tell, this gives all of the advantages of shared
libraries, at a minimal cost. It does not require special coding
of the libraries or any other nonsense; one merely links them a
bit differently.

Anyone see any material drawbacks?

Note that if what's-name comes back with noise about how it
doesn't solve any problems or the mere statement that the
overhead is unacceptable, I'll ignore him and I hope you all will
too. Those of us interested in constructive activity are aware of
the problems that shared libraries can solve and don't need to
prove their existence. As for the overhead, all the overheads I
can see are microscopic in comparison to the overhead of, e.g.,
additional paging or swapping induced by not sharing one's
libraries, so I see this also as a non-issue. If someone has
evidence that I've overlooked something, I'll be happy to examine
it, but I'm not interested in mere assertion.