Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watmath!clyde!rutgers!dayton!umn-cs!stachour
From: stachour@umn-cs.UUCP
Newsgroups: comp.arch,comp.unix.wizards,comp.os.misc
Subject: Re: Big Programs Hurt Performance
Message-ID: <2204@umn-cs.UUCP>
Date: Mon, 28-Sep-87 20:59:00 EDT
Article-I.D.: umn-cs.2204
Posted: Mon Sep 28 20:59:00 1987
Date-Received: Fri, 2-Oct-87 07:26:04 EDT
References: <1665@ncr-sd.SanDiego.NCR.COM) <8579@utzoo.UUCP) <6886@eddie.MIT.EDU) <2067@sfsup.UUCP>
Organization: University of Minnesota, Minneapolis
Lines: 87
Xref: utgpu comp.arch:2294 comp.unix.wizards:4278 comp.os.misc:242
Summary: Shared Libraries do not and should not require shared-address silliness. They should be shared by name, not address.

In article <2067@sfsup.UUCP>, shap@sfsup.UUCP (J.S.Shapiro) writes:
> In article <2501@xanth.UUCP>, kyle@xanth.UUCP (Kyle Jones) writes:
> > In <14888@topaz.rutgers.edu>,
> >       hedrick@topaz.rutgers.edu (Charles Hedrick) sez:
> > 
> > > What you really want is shared libraries.  That way, only one copy
> > > of the code is shared by all programs that use it, but you can
> > > change it.
> > 
> > Please explain more about shared libraries.
> 
> Okay, here goes. I have stayed out of this, but shared libraries I can talk
> about intelligently. Basically a shared library is a piece of code which
> is "shared" between two programs. A portion of the address space is
> reserved in advance by *everyone* for each shared library (that is, the
> shared library has a permanent reserved location in the virtual address
> space). Then, whoever needs the functionality in the shared library simply
> compiles as usual, linking in the shared version of the library instead of
> the normal version. As a result, a marker is put in the binary indicating
> which (if any) shared libraries need to be hauled in. If the marker is
> there, exec() arranges for the shared library to get mapped into your
> address space.
> 
No, requiring reserved address space is needed only in silly machines
or using silly operating systems. Sharing should be by name, such as
mail_system_$get_mail_message, and not by some pre-bound address set.

> ... (stuff deleted) ...        In some implementations (depends
> on your hardware), the jump table points to a stub routine which
> backpatches the "real" address of your function into your code. This has
> the advantage that you only incur the shared library overhead once per
> function, but the disadvantage that you can no longer page in those pages
> from the executable - they now have to go to the paging area.

No, it should go indirectly though a linkage-area specific to your process.
Your code, and all code, should remain read-only, and shared amoung
all processes that use it.
> 
> Only one copy of the shared library text is kept in core for all users. It
> is simply mapped into all of the appropriate virtual address spaces. 

Yes, only one copy, even if sometimes the shared library is running
with different priviledges.  You should note that most hardware
architectures force addressing schemes that mean that one cannot
write shared code, since it cannot run in multiple modes.  Even when
the hardware is OK, often the operating system (like IBM's now
supperceded OS/MVT) memory-management mechanism precludes it.

> ... (more deleted)           Unfortunately, position independent
> code is quite difficult to do, which is why current UNIX compilers (to my
> knowledge) don't do it. This scheme is referred to as "dynamic loading."

No, position-independent code is quite easy to do.  It's been done by
the GE Multics EPL and PL/I compilers for around 20 years. [For historians,
I personally consider 'C' as a cross between untyped 'B' and a subset
of the EPL subset of PL/I.]   By the way, what one really wants/needs
is dynamic-linking, not dynamic-loading.

>  ... (more deleted)
> Side observation: If your binary is 500K, shared libraries don't help much.
> They just don't represent a significant portion of your code. If your
> binary is really that big, you probably have a lot of rethinking to do, and
> ultimately this rethought will be reflected in better performance, greater
> flexibility, and lower maintainance cost.

But your own code should be automatically shared as well, and others
should be able to use the "object-managers" that you have written
without having to put those managers into their own code.

For those wishing to 'really' understand shared code, I recommend
the book by EI Organick on the Design of the Multics System.
It tells how sharing (through real dynaic linking) 
is done on a system that was designed from the beginning for shared,
reliable (utility-grade, as good as the telephones or power company)
computing, and which was designed to make it easy to build good software
(not an explicit goal of hardly any other system).
It's an 'ancient' book, but still more complete 
on the subject than any other I know.

Spoiler-Warning:  If you don't know much about hardware instruction-set
architectures, and/or programmning language run-time needs,
you may not be able to understand this book.


Paul Stachour
Honeywell SCTC:  Stachour@HI-Multics.ARPA
Univ of Minn:    stachour at umn-cs.edu