Path: utzoo!utgpu!attcan!uunet!husc6!rutgers!mcnc!thorin!coggins!coggins From: coggins@coggins.cs.unc.edu (Dr. James Coggins) Newsgroups: comp.lang.c++ Subject: Managing C++ Libraries: Dependencies and Headers Message-ID: <5078@thorin.cs.unc.edu> Date: 4 Nov 88 14:59:35 GMT Sender: news@thorin.cs.unc.edu Lines: 284 Managing C++ Libraries: Dependencies and Headers James Coggins and Greg Bollella Computer Science UNC-Chapel Hill The use of libraries in C++ is complicated by dependencies among the classes in the library. An application program must #include the header files for all classes on which the application depends, directly or transitively. Direct dependencies are clear from the code of the application program itself. Finding transitive dependencies requires knowledge of the internal structure of the library. We might expect a program author to know and to declare what resources he is using directly, but it is unreasonable to require him to know internal structures of the libraries he is using. For this discussion, a dependency between classes A and B exists if the header for class A or any member function of class A refers to an object of class B as a member, an argument, or a local variable. (This definition is more conservative but much simpler than the optimal definition for our purpose.) Due to the complex dependencies between classes in a library, many header files may be required even for applications that use objects of just one class. Without management techniques such as we will describe, ensuring that all of the necessary headers are included requires analysis of the entire dependency hierarchy of the library by the library user. We consider knowledge of the internal structure of the library to be an unacceptable burden on the application developer (or on the library developer!). We seek to minimize the interference of such incidental concerns in the development of code that uses the library. Fortunately, we have developed a scheme that ensures that the necessary headers are included while requiring minimal effort from the application or member function writer. An ideal solution to the problems of header file and dependency management would posess the following characteristics: 1. Whatever is needed gets included. 2. You do not pay for what you do not need. 3. You do not need to know the entire dependency hierarchy when writing main() or member functions. 4. The system should be easy to use. To make this concrete, we want only one '#include' directive to be required in main() or member functions. 5. The solution should support good software engineering practice. 6. The solution should be compatible with multiple inheritance and other anticipated evolutionary changes in C++. 7. A program written using our management system should read only the header files that are necessary and should read them only once. The scheme we have developed conforms to these objectives and allows enough flexibility to handle unforseen situations with minimal hassle. Dependencies Consider the small inheritance hierarchy below. Class foobar is a base class with derived classes foo and bar. Class baz is not part of the inheritance hierarchy, but since class foobar uses objects of class baz, any compilation of foobar requires inclusion of the header for class baz. The dependency structure of these classes is shown in the figure at right below. Notice that we include direct dependencies only; transitive dependencies (foo requires foobar which requires baz) are not noted. (If there were a direct dependency between bar and baz, for example, we would include that link.) Inheritance hierarchy Dependency hierarchy foobar foo bar / \ baz \ / foo bar foobar | baz The determination of which header files to include when compiling an application or a member function depends on the dependency hierarchy, which depends on internal details of the design of the whole system of classes, most of which is embodied in the header files. The creation of such dependencies is essential to the library's usefulness. If objects are to work together at all, and if code is to be reused at all, then dependencies must exist. We need a method for declaring direct dependencies and reliably tracing those dependencies throughout the dependency hierarchies when needed. Rejected Approaches 1. Include what you need In this approach, each member function and each application program must contain #include directives to obtain whatever is needed. This requires the application developer to understand the entire dependency structure of the library, which we find unacceptable. Furthermore, this approach leads to a long list of #include directives, whose creation interferes with the task of software development. 2. Include EVERYTHING We considered #include-ing everything, but this violates the objective of not paying for what you don't need. In a large library, the time required to process all of the .h files is not negligible, so we reject this option. 3. Use #ifndef SYM ... #endif chains A common solution in practice requires surrounding each .h file with compiler directives to test whether a symbol unique to that class is defined and if not to process the header. If the symbol is defined, the translator must still scan the file until reaching the #endif at the end of the file. This is a reasonable solution, which we rejected for several reasons. First, this system allows a header file to be #included many times - it will be processed only once, but we prefer that it not be touched at all if it is not required. Second, this approach requires that the user know the path names to many header files - details that are incidental to the coding task and should be eliminated from his concern. With our hierarchical directory scheme described in a previous article, specifying path names requires that users know the whole directory hierarchy, which we find unacceptable. Third, we find the intrusion of the #ifndef...#endif directives in our header files aesthetically displeasing. We prefer a less invasive approach. The solution we have developed is noninvasive, it requires no knowledge of the dependency structure or the directory structure of the library, and it causes header files to be touched only when required, and only once even then. The following sections explain our scheme in a basic form. If you are a UNIX makefile hacker you can probably improve on the scheme in several ways. Feel free. If you work on PCs you shouls be able to understand and implement this approach without becoming a UNIX wizard. Solution Part 1: Dependency files In the subdirectory for each class, we define a dependency file (with a .d extension) that declares direct dependencies by defining symbols of the form D_. After the appropriate symbols are defined, we check to see whether the "prelude" for the whole library has been defined. If not, we #include the library's prelude file. For the example above, the dependency file for classes bar and foobar are as follows: ............................. ............................. file bar.d file foobar.d ............................. ............................. #define D_BAR #define D_FOOBAR #define D_FOOBAR #define D_BAZ #ifndef D_PRELUDE #ifndef D_PRELUDE #include "../../libprelude.h" #include "../../libprelude.h" #endif #endif The structure of the dependency file is determined entirely by the dependency structure of the library. Typically, a class will declare a dependency on itself, its base class if any, and the classes referenced by the class as arguments to messages or as local variables in member functions. Other additions to the .d file can handle special situations. If there are classes with mutual dependencies, the forward declaration of the sibling classes can be inserted in the .d files of each class. The #includes for header files of specialized libraries may be inserted in the dependency file. In Dr. Coggins' library, for example, header files for the suntools libraries are #included in the .d file of the "imagetool" class which handles image display on Sun workstations. (Note for wizard readers: these special cases limit the desirability of automatic generation of dependency files!) The ability to place #include directives for special .h files in the .d file (thereby placing the #include in every compilation involving the .h file of the class) does not preclude the option of placing #include directives for some system libraries in the specific member functions that require them, or even placing special #include directives in the .h file itself. Solution Part 2: The Prelude File In the main directory for the library, we define a "prelude" file which has three parts. The prelude file for the above example is given below. The first part of the prelude file #includes system header files that we want always to be included. The second part is a level-by-level traversal of the dependency graph from the top down in which the .d files of all classes that have been declared as being required are #included. The top-down traversal is critical to allow all of the transitive dependencies to be correctly noted. For example, if the application program references only class foo, we know to include foo.d, which contains the definition of D_FOOBAR. Since we are going top-down through the dependency hierarchy, we will *later* check D_FOOBAR and include foobar.d which defines D_BAZ and so on. In the third part of the prelude file, the header files of all classes that have been declared to be needed are #included, once and once only. The classes are checked in bottom-up order according to the dependency hierarchy so that every .h file that is required is defined before it is needed by another class definition. Thus, all of the .h files that are needed are included, and they are included only once. ....................................... file libprelude.h ....................................... #define D_PRELUDE #include #include #include #ifdef D_FOO #include "/.../foo/foo.d" Include .d files in top-down #endif order traversal of dependency #ifdef D_BAR hierarchy. This determines #include "/.../bar/bar.d" which .h files will be needed #endif using just comiler symbols. #ifdef D_FOOBAR #include "/.../foobar/foobar.d" #endif #ifdef D_BAZ #include "/.../baz/baz.d" #endif #ifdef D_BAZ #include "/.../baz.h" Include .h files in bottom-up #endif order traversal of dependency #ifdef D_FOOBAR hierarchy. #include "/.../foobar.h" #endif #ifdef D_BAR #include "/.../bar.h" #endif #ifdef D_FOO #include "/.../foo.h" #endif Note to wizards: This file could be automatically produced by an awk program from input resembling the input to make. We show the basic method here so that users without awk or make can implement the approach. We'll probably post wizard-level implementations later ourselves. The prelude file looks more complex than it is. Maintaining the prelude file is also easier than it looks. Classes at the same level in the dependency hierarchy can be listed in any order, so the ordering of the sections is not as critical as it might appear. Also, the only situation requiring modification of the prelude file is the implementation of a new class, which happens relatively infrequently compared to changes in member functions. We have found that development of new classes outside the library is a safe and effective strategy. The classes can be incorporated into the directory structure and the prelude file as they reach maturity. Using the System To use our strategy for managing dependencies, the application programmer must declare the classes used in the program and include the library prelude file. The writer of member functions must simply include the class's dependency file. Examples are given below: ..................... ........................ prog.c foo::reset.c ..................... ........................ #define D_FOO #include "foo.d" #define D_BAZ . #include "/.../mainlib/prelude.h" . . . . . We have found this to be a minimal level of invasion in the process of preparing a .c file, and while our scheme is rather costly to set up, it appears to be easy to understand and maintain. Robustness of the Scheme If the programmer omits a symbol definition for a class that turns out to be required by another class he does list, then everything works normally and there is no error. Thus, if the programmer does know something about the structure of the library, he can take advantage of that knowledge and minimize the administrivia in his .c file. If the programmer omits a symbol definition for a class that is indeed required, the C++ translator will flag syntax errors claiming that "foo is not a class name" when you know very well that it is. If the programmer omits the +e1 flag on his program compilation and links with a library compiled with +e0, the linker will give error messages similar to " __foobar_vtbl__ is not defined". If the checks for a new class are placed in the first (second) traversal in the prelude file anywhere above (below) the highest (lowest) existing class used by the new class, then everything will work correctly, at least until the next class is entered.