Newsgroups: comp.unix.internals Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!think.com!sdd.hp.com!elroy.jpl.nasa.gov!decwrl!frnkmth!bill From: bill@franklin.com (bill) Subject: shared libraries can be done right Message-ID: <20May91.042630.9136@franklin.com> Summary: here's how Organization: Franklin Electronic Publishers Date: Mon, 20 May 91 04:26:30 GMT This nonsense that what's-name has been spouting made me decide to describe how I think shared libraries can be done correctly. This is basically a dynamically linked shared library scheme with the linking occuring as each page is read from its file. Here's how I'd do it. To create a shared library, you first create executable images that contains shared references only. This is a two step process. The first step creates an ordinary object file with undefined symbols. The second step, linking the object with its shared libraries, associates each undefined with the file it is contained in, thus converting it to a shared reference. An object file is executable once the only references it contains are shared references. Note that this permits mutually referencing shared libraries if we allow this linking to occur with the unshared version of a library. In my scheme, "shared library" is a misnomer; they are really "dynamically linked executables" or some such buzz; the difference between an executable and a shared library is only that executables have an entry point and shared libraries don't. When a file is exec'ed, its referenced shared libraries are opened if they haven't been already. This is done recursively so that a processes knows at the start that all shared libraries needed are available and don't conflict in addresses. (This isn't necessary, actually; one could defer this till later, with the attendant failures at obscure points in random programs.) Each time that a page is loaded in from a file (at exec time or at a page fault), any shared references are satisfied in that page before the page is made available to any process. Here are the overheads: Program startup requires making the shared libraries available. This should be significant only the first time that a shared library is referenced; all other references should discover quickly that the shared library is opened. When a shared library is opened, a special segment is created for its symbol table. This increases, slightly, the memory needed for using the shared library. When an executable or shared library is opened, a segment has to be created to hold the shared reference information. This also costs some memory. When a page is loaded from its file, its shared references must be resolved. This implies references to the shared reference information for its file and the symbol tables of referenced shared libraries. Depending on the implementation, this could happen each time the page is faulted in, or it could be done only the first time the page is read from its file and that page could be then stored in a swap area. In any case, this amounts to a very simple and fast loop to do the fixups and it only has to be done occasionally. Here is the drawback: Each shared library must exist in a specified region of virtual memory and this must be decided when the shared library is created. If one wanted to be clever and avoid this problem, the shared libraries could also contain relocation information. The way this would be used is this: a shared library would have a "preferred" location, one where the library gets placed if there are no conflicts. When located at this preferred location, no relocation is done. However, if there is a conflict when an executable is started, a new set of segments is created for a relocated version of the shared library, at a system selected address; these new segments could be used to deal with other conflicts as well. This would incur an additional overhead, but only for those processes that reference a relocated shared library. Also, by opening shared libraries in reverse order of linkage, system shared libraries generally would never be relocated, resulting in the cost being borne by users (or vendors) who create shared libraries and don't take care to avoid the system's library addresses. --- As near as I can tell, this gives all of the advantages of shared libraries, at a minimal cost. It does not require special coding of the libraries or any other nonsense; one merely links them a bit differently. Anyone see any material drawbacks? Note that if what's-name comes back with noise about how it doesn't solve any problems or the mere statement that the overhead is unacceptable, I'll ignore him and I hope you all will too. Those of us interested in constructive activity are aware of the problems that shared libraries can solve and don't need to prove their existence. As for the overhead, all the overheads I can see are microscopic in comparison to the overhead of, e.g., additional paging or swapping induced by not sharing one's libraries, so I see this also as a non-issue. If someone has evidence that I've overlooked something, I'll be happy to examine it, but I'm not interested in mere assertion.