Xref: utzoo gnu.gcc:1220 comp.unix.wizards:19881 Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!mcsun!ukc!dcl-cs!aber-cs!pcg From: pcg@aber-cs.UUCP (Piercarlo Grandi) Newsgroups: gnu.gcc,comp.unix.wizards Subject: Re: Is your system polluted? Summary: pollution of namespace is even worse. Message-ID: <1552@aber-cs.UUCP> Date: 22 Dec 89 17:34:04 GMT Reply-To: pcg@cs.aber.ac.uk (Piercarlo Grandi) Distribution: gnu Organization: Dept of CS, UCW Aberystwyth (Disclaimer: my statements are purely personal) Lines: 90 In article <8912211630.aa04575@ICS.UCI.EDU> rfg@ICS.UCI.EDU writes: As part of the work I'm doing on protoize/unprotoize, I decided that it would be a good idea to be able to find out (for any given system) what the names of all of the functions declared in system include files are. I wrote the following script to do part of the job. The results that I got from running this script on one system are very saddening. It appears that (for some systems at least) there is an awful lot of pollution of various name spaces contained in the system include files. Specifically, there are lots of clashes of names where one name is used for two (or more) different things in two (or more) different include files. This means that you may/will get errors if particular pairs of include files are included into the same base file. :-( Actually things are even worse than Ron Guilmette says. Not only a lot of second rate hackers put duplicate names in system headers, but they do the following things as well: 1) internal kernel entities are declared in headers for application use. A very bad offender here is System V.3.2, some BSD versions make an attempt at least to bracket these within #ifdef KERNEL #endif (which is still unsatisfactory). 2) a more generic problem is that a lot of user level packages declare in the headers also entities that are only used internally to it. 3) even worse, a lot of libraries contain externals that are not declared static. This is very dangerous, because you may unwittingly use the same name in your program, and then all hell breaks loose. A particularly bad offender is curses. In C++ this is less troublesome as you can stuff things within the walls of a class, and their scope will then be local to it. Except for typedefs, unfortunately, but at least C++ 2.0 allows encapsulation of enums (and class names, but that is virtually unavoidable). In C, where we don't have a proper modularization facility, the following guidelines ought to be followed: 1) All global entities declared by a module should start with a well advertised module prefix, including #defines, procedure, variables, enums, structs, typdefs,... This has already been partially done with existing libraries, e.g. for prefixes 'str', 'f' (stdio), 'w' (curses), but usually in a half baked way. As a solution it is not complete, in that you may have then clashes of prefixes, but at least the problem becomes an order of magnitude less severe. In C++ this is done by putting as much as possible within class boundaries. 2) File names should also start with the modules prefix, both headers and sources. Such names can be either of the form .h (e.g. StreamIn.h, StreamOut.h, StreamRw.h) or /.h (e.g. Inet/Udp.h, Inet/Tcp.h, ...), depending usually on their number (or the length of the name under System V). 3) Published headers should contain only the client interface of a module. Actually, for sophisticated modules, the client interface should be split in several headers, each containing only a subset, of entities likely to be used together. Eschew all inclusive header files (e.g. like "builtin.h" in libg++). 4) The internal interfaces of a module should be in a separate set of headers that is not published. For example, my tree library has two headers, "Tree.h" and "Tree/Own.h", and the latter contains the declarations of utility entities used by the other sources in the library, and is not published. Splitting the header is better than bracketing with #ifdef KERNEL #endif. 5) Under Unix, published headers ought to be in /usr/include if they are for modules implemented at the user level, /usr/include/sys if they are for kernel level modules. Internal interfaces ought not to be in either; they ought to be in /usr/sys/h or the directory that holds the module sources, e.g. /usr/src/lib/libc. If there are multiple headers, according to rule 2), 6) All file global entities internal to a module should be declared static. If they cannot, because the module is split in several source files, then respect of rule 1 is absolutely essential. Naturally all these rules are palliatives; what we should really have, and given C, C++, and Unix and other similar operating systems, we will not have, is a tree of symbol tables. To have this the best way is to have an object store, like in RSRE Flex or Cambridge CAP, or some Lisp machines or systems, but this is wishful thinking... Second best would be something like Multics, as usual. -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk