Xref: utzoo comp.unix.wizards:11037 comp.os.misc:509
Path: utzoo!attcan!uunet!husc6!rutgers!aramis.rutgers.edu!athos.rutgers.edu!hedrick
From: hedrick@athos.rutgers.edu (Charles Hedrick)
Newsgroups: comp.unix.wizards,comp.os.misc
Subject: Re: shared libraries (was tracing system calls)
Message-ID: <Sep.9.22.46.21.1988.22612@athos.rutgers.edu>
Date: 10 Sep 88 02:46:22 GMT
References: <21606@ccicpg.UUCP> <7622@boring.cwi.nl> <2040@cuuxb.ATT.COM> <7716@bigtex.uucp> <67440@sun.uucp> <7804@bigtex.uucp>
Organization: Rutgers Univ., New Brunswick, N.J.
Lines: 34

james@bigtex.UUCP (James Van Artsdalen) asks about the overhead of the
position-independent code used in supporting Sun's shared library
scheme.  I don't think that's likely to be an issue.  There are
several ways of doing position-independent code.  One is to use PC
relative addressing where you would have used absolute before.  That
is, suppose you've got
  load r1,foo
Normally you'd expect the loader to relocate foo to an absolute
address.  If you've got PC-relative addressing, you can instead have
it take the difference between foo's location and the location of the
instruction itself (which is always the same, no matter where the
sharable library happens to be) and use a PC-relative mode.  If I read
the 68000 instruction book correctly, PC-relative addressing is just
as fast as absolute.  I think the Intel chips tend to use PC-relative
a lot more, and so I'd think there would be no overhead there either.

If the machine can't do that, then you can make everything indexed by
a register (the old IBM/360 scheme).  For Intel chips you could
arrange to load segment registers with the address of the code.  This
involves slight overhead, since when you call a routine in the shared
library, you have to load a register with the address of the library,
but presumably that just adds an instruction or so to those calls.

Finally, if all else fails, you can use a run-time loader to resolve
symbols.  Sun has such a thing.  By various clever techniques they
avoid having to do very much run-time relocation, but when all else
fails, they can do fixups that are based on the address where you have
mapped in the sharable library.  The words that need to be fixed up
are put into a contiguous area, so that the fixups leave the majority
of the code pure.

I'd be willing to bet that on any of the common architectures a
combination of these techniques can reduce the overhead to the point
where it isn't noticable.