Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 6/24/83; site watmath.UUCP Path: utzoo!watmath!kpmartin From: kpmartin@watmath.UUCP (Kevin Martin) Newsgroups: net.lang.c Subject: Diatribe on uninitialized externs Message-ID: <9572@watmath.UUCP> Date: Wed, 24-Oct-84 19:51:50 EDT Article-I.D.: watmath.9572 Posted: Wed Oct 24 19:51:50 1984 Date-Received: Thu, 25-Oct-84 03:20:13 EDT Organization: U of Waterloo, Ontario Lines: 121 The following article refers to the entire C environment, including the compiler, the linker, and the operating system. There seem to be four alternatives for what to do with externs and statics which are not explicitly initialized: 1) Have their value be undefined (i.e. garbage). Disadvantages: Breaks many current programs. It could be argued that well-written programs (as opposed to 'correct' programs) would not be broken, since a well-written program initializes variables explicitly if it cares about the initial value. Arrays of unknown size become effectively impossible to initialize (at all)(see note 1) Advantages: Consistent behaviour with autos and malloc'ed space Consistent with normal reason (i.e. the variable contains a predictable value ONLY IF it has been initialized in the C source). Tends to encourage easy-to-read code: the reader can tell (or *should* be able to tell, if coded cleanly) if there is initialization *code* somewhere. e.g. you are sure that in int x; int y = 5; there is initialization code (somewhere) for 'x' but not for 'y'. Makes object and a.out files smaller, thus program load time is also reduced (note 2)(note 4). Allows the programmer to get genuine "bss" (un-initialized) space. This becomes especially important if overlays are being used, since it may be desired that an overlay be loaded without re- initializing all the variables it contains (note 4). 2) Have their value be the 0 bit pattern. Disadvantages: Programs which don't explicitly initialize their pointers and floats would not port to any more machines than they currently do (note 3) Arrays of unknown size containing floats, doubles or pointers cannot be initialized (note 1). Advantages: This is the current method (i.e. inertia reigns) Makes object and a.out files smaller, thus program load time is also reduced (note 4). 3) Have their value set to a zero of the appropriate type. Disadvantages: Requires a somewhat arbitrary rule on "what is the appropriate type for a union?" Generates larger object files, etc (note 4). The programmer cannot signal to the reader that a variable is deliberately being left un-initialized. Arrays of unknown size cannot be initialized if they contain non-zero values. Advantages: Allows old code to be ported to new machines (note 3). 4) A combination of (1) and (2): Un-initialized variables start off as zero in the first overlay that is loaded. Subsequent overlays get whatever was left in the storage location by previous overlays. Disadvantages: Same as for (1), except that existing programs are not broken. Advantages: Same as (1), except that sloppy coding has a better chance of running. Note 1: By "array of unknown size", I mean, for example, and array whose size is a #define'd constant. There is currently no method of giving explicit initializers to such an array in its entirety, unless the source file is heavily modified each time the #define'd constant is changed. Note that the improved CPP facilities (#eval and genuine macros) which I described in an earlier article would allow such arrays to be initlalized to *any* value (not just zero bit pattern or zero of the appropriate type), thus making the variations on this disadvantage go poof. Note 2: Since most systems clear the memory before a program is loaded, for security purposes, method (1) often flukes out to be method (2). Note 3: If the purpose of the standard does not include porting existing (old) programs to new C implementations on "hostile" hardware, this advantage/ disadvantage does not exist. I believe that it is the case that the new standard should allow NEW programs to be written portably, and that old programs continue to work, but *only on machines on which they already work*. Note 4: These features (reduced object or a.out size, and overlays) may or may not exist on any particular system, and they may be non-issues to many users (because they have lots of disk space, or they think overlays are for the birds). However, these features *do* exist on some systems, and the users *do* find them useful, and it would be desireable that the standard *not* be written such that a compiler has to be non-conforming to take advantage of such features. If overlays are going to be ignored, (2) and (4) are equivalent. Ignoring the problems of upward compatibility and lazy programming styles, choice (1) is the winner. However, given that old programs must continue to work, Choice (4) looks like the best one. The only bad problem with (4) is that of array initialization. As mentioned above, this can be solved much more generally with an improved CPP. This standard will probably not include such features, or a method of choosing which union member to initialize. But there will be more C standards down the road, and these features may appear, making (1), (2) or (4) the clear winning choices. If the committee goes for choice (3) now, this will only encourage code which doesn't explicitly initialize things, and make for an even larger base of software to break when the next standard tries to go back to choice (1) or (2). I consider (4) with improved CPP to be the long-range goal, and the implementation of (3) in the current standard prevents changing to (4) in the next standard. We can either let it sit as is for now, and fix it properly when the facilities become available, or we can (for the feeble reason of porting old shit code to new machines) paint ourselves into yet another corner by fixing it poorly immediately. Kevin Martin, UofW Software Development Group