Path: utzoo!attcan!uunet!bywater!arnor!lusitania!lowry From: lowry@arnor.uucp Newsgroups: comp.lang.misc Subject: Re: C's sins of commission Message-ID: <1990Oct21.180739.8933@arnor.uucp> Date: 21 Oct 90 18:07:39 GMT References: <64618@lanl.gov) <2883@igloo.scum.com) <2171@enea.se> <1990Oct8.135551.21639@arnor.uucp> <1990Oct10.101527.2247@maths.nott.ac.uk> <1990Oct15.204343.2907@arnor.uucp> <1990Oct19.160210.9787@maths.nott.ac.uk> Sender: news@arnor.uucp (NNTP News Poster) Reply-To: lowry@lusitania.watson.ibm.com (Andy Lowry) Organization: IBM T. J. Watson Research Center Lines: 73 In article <1990Oct19.160210.9787@maths.nott.ac.uk>, anw@maths.nott.ac.uk (Dr A. N. Walker) writes: |> There are also some problems with compilers that change |> algorithms behind the programmer's back, depending on some complicated |> analysis. Suppose my program has a bug that is benign in some |> implementations, but not in others. Lo and behold, all my test programs |> work, the production run fails, and as soon as I try to isolate the bug, |> it goes away. (Yes, I know we all have bugs like that sometimes!) You're right, I have experienced many such bugs in my time. I can see such a scenario coming about for three reasons: (1) there is a bug in the program, but in some implementations it does no damage and has no visible effect on the program (e.g. the program fails to initialize a counter to zero before referencing it, but since the operating system happens zero static variables during program load, things work out); (2) the program has different resource requirements under different implementations, and therefore the two implementations act differently in the face of resource depletions; (3) the compiler introduces a bug when applying its transformations. Category 1 is addressed in Hermes by a new compile-time mechanism we call "typestate checking." We use dataflow analysis techniques to track static characteristics of variables such as their initialization state and the case of variants. Each operation of the language has typestate preconditions, which define what typestate conditions must hold for the operation's operand variables, and typestate postcondition rules which define how the operation affects the operand typestates. The compiler uses the postconditions to compute a fixed-point typestate at every point in the program, and rejects any program that attempts an operation with operands that are not in the correct typestate. In addition, when two or more program paths merge, a meet operation (in the typestate lattice) is performed to compute a single typestate for the merge point, and "coercion" operations are inserted on all the incoming paths to lower their individual typestates to the computed meet. One effect of the coercions is that the program is augmented at compile time to automatically dispose of its data values at appropriate points. We get automatic garbage collection without the run-time cost normally associated with this feature. Typestate checking elimintates category 1, because it makes it impossible to write programs that do not have a precise, implementation-independent meaning. Thus, in the absence of compiler bugs, and given adequate resources, one is guaranteed that two implementations of the same buggy program will both exhibit the bug. Category 2 is a difficult problem. If memory is tight, an implementation that keeps lots of auxiliary data structures to speed certain operations may fail whereas a slower implementation that requires less memory will succeed. In Hermes we provide a built-in exception called "Depletion" that is meant to handle resource depletions, such as: memory depletion, computing a numerical value exceeding hardware's range or accuracy limits, insufficient sockets available for network communications, etc. The Hermes programmer is not spared these exceptions (though we have some ideas on how this might be achieved). Implementations are free to use whatever tricks are available to avoid depletions (like swapping real memory to disk, migrating data to other network sites, using extended-precision arithmetic software, etc.), but we don't make it a requirement for language conformance. The pragma mechanism I've mentioned before may be able to address many of these situations by allowing the programmer to encourage the compiler, for example, to be sparing in its memory utilization. Category 3 is a problem in any language except machine language (even assemblers can introduce addressing errors). The best I know how to do is to continue paying especially close attention to compilers and other low-level software. Adopting languages and methodologies for this software that reduce the frequency of bugs (like typestate checking) will help, of course. -- Andy Lowry, lowry@ibm.com, (914) 784-7925 IBM Research, 30 Saw Mill River Road, P.O. Box 704, Yorktown Heights, NY 10598