Path: utzoo!mnetor!uunet!husc6!cmcl2!brl-adm!adm!ADLER1@brandeis.bitnet From: ADLER1@brandeis.bitnet Newsgroups: comp.lang.c Subject: Tools for Reading Readable Code Message-ID: <11067@brl-adm.ARPA> Date: 1 Jan 88 07:21:27 GMT Sender: news@brl-adm.ARPA Lines: 124 This message may be regarded as a sequel to a message I posted several months ago entitled "Reading Readable Code" in which I asked for advice on how to read large C programs. Here is one stage of my thinking about this question. I am interested in having a tool which I think others would also find useful. It is based on the point of view that a C program is a collection of commands which are extensions to the language C. From this point of view, one can analyze a program in a certain way and one can make these new commands available to oneself with a minimum of hacking. Here is how it works. Let CF denote the set of all functions which are normally part of the language C, including those in the libraries. Let P be a C program which we assume compiles and runs and which will be fixed throughout this discussion. Let PF denote the set of all functions which are defined in the C program P . If A is a subset of PF, denote by PF[A] the C program obtained by commenting out all of the function definitions of functions which are not in A. Depending on the choice of the subset A, the program PF[A] may or may not run. I am interested in subsets A such that PF[A] does compile,link and run. One obvious condition on the subset A in order for this to happen is that every function called in PF[A] is either in A or is in CF . Let us agree to call a subset A of PF "closed" if this condition is satisfied. It is easy to see that there is a topology on PF with respect to which these "closed" sets are actually closed, and that is one reason for this choice of nomenclature. But this discussion does not require a knowledge of topology. Given a subset A of PF, we define the "closure" of A to be the smallest closed set containing A . Thus, the closure of A is the set B of all functions in PF which must be included in any working program which contains PF[A]. The closure of A will be denoted by A# . We also define K(A) to be all of the functions which are called directly within the bodies of functions in A . Then K(A) may or may not contain A. Now, for any positive integer n define L(n,A) to be A if n=1 and to be the union of L(n-1,A) and K(L(n-1,A)) if n > 1 . Since PF is a finite set, there is some positive integer m such that L(m,A) = L(m+1,A) . By the "level of A" I mean the smallest such integer m and I denote the level of A by LEV(A) . It is easy to see that the closure of A is the same as L(LEV(A),A) . Having arrived at this notion of the level of a subset A, it is possible to introduce some natural subsets A of PF which lead to interesting programs PF[A] . For example, suppose k is a positive integer and we let PF_k denote the set of all functions f in PF such the the set {f} has level at most k. Then PF[PF_k] runs and only uses relatively uncomplicated functions. Take the case k=1, for example. Then PF[PF_1] is a program which contains all of the functions in PF which call only C functions and themselves. If one wanted to study PF, one could first play with PF[PF_1] to familiarize oneself with the level 1 commands. These would also be available to one in other programs without a lot of hacking. After one mastered the level 1 functions, one could play with PF[PF_2], and so on. Many people do something like this in a random intuitive way when looking at a program, but this formulation makes possible to automate the process. Thus it is desirable to have routines which, given a set A, will do the following: (1) edit the file containing P and produce the program PF[A] . (2) compute the level of A . (3) compute the closure of A . It is also desirable to have routines which will find sets such as PF_k . There are other interesting sets which one could work with. For example, suppose one defines the dimension of a closed set A to be the largest nonnegative integer n such that there are n nonempty closed sets A1, A2, A3, ... An such that A properly contains A1 , A1 properly contains A2, etc. A minimal closed set will be called an "atom", so an atom is a closed set of dimension 0 . One could then play with all of the PF[A] where A was an atom or one could let A be the union of all of the closed sets of dimension at most k . Given a routine to compute the dimension of a closed set and a routine to collect all closed sets which have given numerical characteristics such as level and dimension and routines to take unions and intersections and complements, one could have great flexibility in configuring the program PF[A] that one would like to play with and the work of configuring it would be done by the computer. One of the functions almost certain to be commented out is main(). This allows one to write a new version of main and it is in this version of main that one writes programs which use the new commands which one has found. If one wanted to go off the deep end, one could forget all about programming and start computing the homotopy type of the space PF with the topology I have described above, but let us try to focus on the practical problem of implementing the routines which I have suggested would be desirable. Here I run up against the problem that I am really weak at writing C programs. I am not equal to the task of writing a program which will make the list PF, for example. Nor am I capable of writing a routine which will produce a list of all functions referenced by a given function. I have ideas for kluges to get around my weakness in this area but even the kluges lead to difficulties which I can't get around. Here are some possibilities: (1) In GNU emacs, there is a nice program called etags.c which produces a file called TAGS which contains all of the functions defined in the programs under study. Even I can write a program which will edit this file and produce the list PF . But I have had a lot of trouble getting etags.c to run on non-UNIX systems. On VMS , on the IBM 370, and on an IBM PC XT, it produces the file TAGS but it doesn't print the functions names (only their positions) and there is some doubt in my mind as to whether it is recognizing functions properly. I am not competent, apparently, to patch it up so that it works or to analyze etags.c by itself without going through the laborious process of implementing my suggestions for PF[PF_1] etc by hand. (2) In GNU CC there is a file called parse.y which contains a yacc (or rather bison) grammar for the GNU C compiler. If I understood the workings of yacc better, I could modify the file so that its actions would do all of the editing required to produce PF[PF_k] or PF[A] for various A . Or, it could compute the closure of a set A. These seem like handy options for a compiler, but I am not competent to do it myself. The only kluge that occurred to for discovering dependence of functions was to write a program which would do textual analysis of the error messages from the linker when PF[A] is compiled. But even that is somewhat beyond me. If anyone has any such routines or if anyone is interested in writing such routines or has any other advice, I welcome your correspondence. Sincerely, Allan Adler ADLER1@BRANDEIS.BITNET