Path: utzoo!attcan!uunet!husc6!mailrus!cwjcc!hal!nic.MR.NET!tank!mimsy!chris From: chris@mimsy.UUCP (Chris Torek) Newsgroups: comp.unix.wizards Subject: Re: libraries Message-ID: <15168@mimsy.UUCP> Date: 26 Dec 88 07:59:01 GMT References: <15126@mimsy.UUCP> <1282@nusdhub.UUCP> Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742 Lines: 171 (I shall not bother much with technical points, but just clear up a few things, after which I intend not to say anything further on this subject. Winning arguments in comp.unix.wizards, or having the last word, is not that important to me. Feel free to press `n' now :-) . One `>' is rwhite@nusdhub.UUCP (Robert C. White Jr.); two `>>'s is me, etc.) First: my original point was a philosophical stance. If I may paraphrase Ken Thompson: File formats fill a much-needed gap in the Unix operating system. (He may have meant `in the kernel', but I think the gap is needed everywhere, inasmuch as it is feasible to maintain it.) Perhaps the Unix file system *should* be flexible enough not to require archive libraries for efficiency. If it is not so, perhaps the Unix file system is a failure. [Douse the flames; I said *PERHAPS*! :-/ Please take careful note of opinion words. I am also trying not to insult.] Now, Mr. White took me to task for (as I understand it) giving even a moment's consideration to the possibility that, in modern Unix file systems---by which I mean the like of the BSD FFS or the V9 FS; rehacked V7 implementations like those in the System V releases I know of might not apply---directories might in fact suffice to replace `ar' format libraries. I said, in essence: `Maybe it could work! And if it does work, maybe we can get rid of this file format, flush the extra code from make(1) [see parallel discussion of BSD make vs archives], recreate that much-needed gap, and live happily ever after.' He (correctly) accused me of not having tested the proposition at all, and then suggested >>>... an excersize in intuition and deduction try the following: Here we had an unfortunate misunderstanding: I wrote (including the square brackets): >>[steps left out: essentially, decide how much time it would take to >> link-edit by reading individual .o files instead of a .a file.] This was poor wording on my part, for which I apologise. I did not mean `left out of the exercise', I meant `left out of my summary; not copied into the >>>-level quote'. At any rate, I then performed a small `proof of concept' test: the kind of thing that a researcher should do before investing much effort in some approach, to see whether the effort is worthwhile. If the `proof' (which here means `test') fails, the concept is not worth pursuing, not in that form. I was quite surprised to find that my simplified test showed that directory libraries might possibly be *faster* than the existing archive random library format, in some cases. The concept tested well, and therefore might be worth further work, investing more effort. It might still be hopeless, of course. But it looked promising. Mr. White then insults my simple test (which, please note, I did not claim to be a thorough example, or even a typical one---whatever that may be): >While the extreme case, e.g. 1 object include, sometimes shows a >reduction, unfortunatley most of us compile things slightly more complex >than "hello world" programs. Your second example also is a fraud in >that it didn't search through a directory containing all the *.o files >normally found in libc.a. If it had you example would have failed. In >your "good case" example you only search for the hw.o file mentioned in >the "bad case" portion not a directory containing many *.o files. Ah, but I did! I must admit that my first test was on a small directory, with only the needed .o files; but I repeated the test, this time using the sequence: mkdir tmp cd tmp ar x /lib/libc.a to fill the directory with .o files. The times taken to link were, within the resolution afforded (4.3BSD-tahoe on a VAX 8250), identical. The current BSD file system does not exhibit the O(n^2) directory lookup times that more primitive V7-based systems still do; directory lookup time is somewhere between O(1) and O(n), leaning heavily towards O(1) (the namei cache averages about 85% effective on our systems as they are currently used; cache hits are O(1); but I do not care to guess as to what effects directory libraries might have on this). >More clearly there is no "selection and resolution" phase involved in >your second example, by manually including all the objects ( with *.o ) >you are instructing the loader to use "minimum" objects sepsified. As I myself pointed out. Yet: >Your example never invokes the unresolved refrences code that does all >the time consuming things we are discussing. I thought your claim was that the time-consuming part was in opening N (here 26) files instead of 2. That time is counted in `system' time. The system times for the two links were identical. >So you admit that you didn't scan the full 317, nor the directory that >contained a full 317, you only took the files you needed. Invalidating >your example. The directory contained all 317. Ld opened only 25 of those---yet it took ld no more time to open and read those 25 than it took it to open and read /lib/libc.a. One has to wonder why this is so. (Proof-of- concept: test! and then think, and test again!) There are three obvious possibilities: (a) ld is hopeless; (b) the open() syscall is cheap; (c) the cost of 25 opens is equalled and/or exceeded by the cost of skipping the 292 files (oh very well, `archive members' :-) ) contained within /lib/libc.a that were not needed by this (overly) simple program. Options (a) and (c) seem likeliest to me, but the fact remains that *something* is up. >>So what ld might do, then, is read the `.symtab' file and, using that >>information, recursively build a list of needed .o files. It could >>then open and link exactly those .o files---never touching any that are >>not needed. ... >Compare this to: >Read each .a file once. Juggle the same pointers necessary in both >examples. Write output. exit. >LD(1) says that: If any argument is a library, it is searched exactly >once at the point it is encountered in the argument list. (order is >only significant in symbol conflict, etc.) LD(1) lies. At least, it does in 4.3BSD. The `searched exactly once' is one of those Little White Lies told to simplify the situation. It really means that ld will not back up over its argv list. If, e.g., you ask it to read -lc, then -lI77, you may get unresolved references from libI77 to libc. In actuality, ld works something like this: do { hits = 0; for (p = ranlib_list; p < last; p++) { if (need_symbol(p->name)) { read_member(p->offset); hits++; off = p->offset; while (++p < last && p->offset == off) /* void */; } } } while (hits > 0); This sequence causes such a slew of syscalls---read()s and lseek()s--- that ld can open 25 files in the same amount of system time. (It will read several times in the presence of loops in the call graph, since it is attempting to maintain a partial order. There are better ways....) One might wonder, then, if perhaps the BSD file system has file-opening sufficiently `jazzed up' that any loss of performance in changing from the current scheme---or better, from an improved edition thereof---to a directory scheme is small enough to ignore, for the sake of convenience (just whose convenience, I will not spell out here). Certainly nothing is free; and certainly name lookups seem inherently more expensive than the lseek()s and read()s required by the current scheme---and, before this began, I would have guessed they would be so much more expensive that my simple test would fail. Yet I *do* have to test to know. So I did, and now I know that directory libraries are not utterly unworkable. Less efficient in general, perhaps (though I do not yet *know* this); but there are many aspects of existing Unix systems that are less efficient than they might be, usually in the name of convenience. And even if, upon further testing, I were to deem the scheme a failure after all, my philosophical objection still stands: If name lookups were cheaper, how much would the whole system benefit? If small files were stored more compactly---libraries do, in the current system, save space---how much disk might we reclaim? In short: If the file system made libraries unnecessary, would Unix improve? -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris