Xref: utzoo comp.unix.wizards:23578 comp.lang.c:31312 Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!cs.utexas.edu!usc!apple!uokmax!munnari.oz.au!goanna!ok From: ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) Newsgroups: comp.unix.wizards,comp.lang.c Subject: Re: OK, so why _does_ ld resolve text against data? Message-ID: <3605@goanna.cs.rmit.oz.au> Date: 23 Aug 90 04:03:24 GMT References: <1990Jul30.104726.22660@mtcchi.uucp> <37909@ucbvax.BERKELEY.EDU> <930@eplunix.UUCP> Followup-To: comp.lang.c Organization: Comp Sci, RMIT, Melbourne, Australia Lines: 76 In article <930@eplunix.UUCP>, das@eplunix.UUCP (David Steffens) writes: 1> Now my question is, why does the linker silently resolve 1> [ a ] function reference to [ a ] global variable 1> without even a whisper of a warning? ... To start with, there are operating systems where this kind of thing cannot happen (love that B6700 MCP...). I was *appalled* the first time I had a subroutine call link with a common block. So this is a deliberate choice. The UNIX linkers allow a function reference to be resolved by a global variable because they have no way at all of telling one from another. The title of this thread refers to "text" and "data", but a read-only array may well be in "text" space. Consider the "-R" flag on BSD UNIX compilers and the "const" keyword in ANSI C. (There _is_ symbol table information available in COFF format, but the linker can't use it because it isn't always there. Galling, no?) If you use Simula 67, Ada, Modula-2, or recent versions of C++, their language support systems keep around enough information to ensure that this kind of mistake _is_ detected. cfront 2.whatever-it-is kludges it by frobbing the names. Ugh. But it works. So you might consider switching to C++. > Nevertheless, the linker _is_ blameworthy because it will _also_ happily > use the address of one of my global variables to resolve a function call > embedded in a library routine for which I have no lintable source, e.g. > int index; > main() > { > /* lots of code, none of which uses index() */ > vendor_library_routine(); /* which, unknown to me, uses index() */ > } The ANSI C committee thought about this; that's precisely the "namespace pollution" issue they were concerned about. Unfortunately, all that gives you is assurance that the C runtime library doesn't pollute your namespace; no guaranteees about anything else. Myself, I don't see data/function collisions as being any worse than function/function collisions. There is a certain UNIX variant that I sometimes use which provides a dynamic loading library routine, which swipes an _amazing_ number of useful and obvious names; if I get one of those by accident, or if my routine interferes with it, it doesn't improve matters that at least it was a function I collided with. If anything, it makes the mistake _harder_ to find. > The chances of a name collision of this sort rises exponentially > with every new UNIX release. Get an ANSI-compliant compiler and the chance of accidental collision *with the C run-time library* drops to 0. But use a vendor-supplied function which is in neither ANSI C nor POSIX, and I'm afraid you're right. The ultimate problem is that C assumes a _single_ global "extern" namespace. Just like Fortran and Pascal, in fact. Using COFF format, it wouldn't be too hard to produce a "packaging tool" which took description files import , ... export , ... source , ... and did a "ld -r" to hook the files together into one library with names mangled so that only the imports and exports were left untouched (it would have to know about the local Pascal, C, and Fortran run-time libraries, so that their names were preserved, but providing for that wouldn't be hard). I've thought about doing this, but the trouble is that I keep running into machines which are COFF "with extensions" or modifications. It would also be comparatively straightforward to look at the symbol table information left by the "-g" option, execpt that (a) too many compilers won't give you symbol table *and* optimisation at the same time and (b) the symbol table information is not as portable as it might be either. -- The taxonomy of Pleistocene equids is in a state of confusion.