Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!tut.cis.ohio-state.edu!ucsd!hub!eiffel!bertrand From: bertrand@eiffel.UUCP (Bertrand Meyer) Newsgroups: comp.lang.eiffel Subject: Static typing for Eiffel Keywords: The full paper (at long last) - PART 1 (of 2). Message-ID: <176@eiffel.UUCP> Date: 10 Jul 89 22:27:04 GMT Organization: Interactive Software Engineering, Santa Barbara CA Lines: 661 This article is the discussion on Eiffel types which has been announced for a rather long time. I apologize for the delay. Finding the few hours needed to finish this paper turned out to be more difficult than I expected when I announced its coming availability in January after completing the first draft. The preparation of version 2.2 of the Eiffel system, addressing problems that are more relevant to Eiffel use, seemed to command higher priority. All those people (several hundred) who requested a paper version back in February will receive one. The mailing won't be sent until early August. IMPORTANT: The form below is still a draft. ------------------------------------------------------- Warning: This text has been designed with laser printing in mind (fonts, subscripts), and some parts may be hard to read without these. Sorry. ------------------------------------------------------- Because its size is > 50K, this article has been split into two parts for posting to Usenet. Please make sure you have both parts. PART 1 OF 2 STATIC TYPING FOR EIFFEL Bertrand Meyer January 1989 (updated, July 2-4, 1989) 1 OVERVIEW The object-oriented language Eiffel [5] is statically typed: in other words, type errors may be catched by mere examination of software texts. Unfortunately, the description of the basic type rules in [5], as well as the type checking performed by the current Eiffel implementations, did not cover certain kinds of type errors. These errors have two interesting properties: they almost never occur in the practical development of software systems if proper management procedures are observed; yet it is easy, even for a newcomer, to come up with artificial examples that exhibit some of these errors. In recent months several people have independently come across these problems and in some cases have even published their findings as evidence of purported flaws in the Eiffel type system. This article shows that the type policy adopted by Eiffel is sound, and actually claims that it is the only policy compatible with Eiffel's aim to provide a clean, powerful basis for the practical development of medium- and large-scale software systems in industrial environments. The article gives the precise rule for static typing in Eiffel and describes the strategy for implementing this rule as part of the incremental, modular compilation method used for the implementation of Eiffel. Lack of time has prevented the full inclusion of these checks in release 2.2; they will, however, be part the next release (3). Most of the arguments that have been voiced in previous discussions are theoretical. In contrast, this article considers in detail the pragmatic aspects of using classes and inheritance in an industrial environment. A purely theoretical approach misses the software engineering criteria without which the entire discussion is meaningless. Section 2 discusses the notion of typing in a general context. Section 3 explains what typing and type failures mean for Eiffel. Section 4 introduces the rationale behind the Eiffel type policy. Section 5 gives the precise type rule. Section 6 outlines the strategy for enforcing this rule as part of Eiffel compiling. Section 7 is a conclusion and assessment. 2 BACKGROUND During the execution of a computer program, an attempt may be made to apply an operation to operands which are not of the form, or type, expected by the operation. Such a regrettable course of affairs may be called a type failure. The rest of this article discusses how to avoid type failures in object-oriented programming. 2.1 Checking types One possible defense against type failures is dynamic checking, whereby the applicability of an operation to candidate operands is ascertained at run-time, just before the operation is to be applied. For anyone concerned with software reliability, however, this is not a satisfactory solution, since run-time is too late a stage to find a suitable recovery from a type failure. A better strategy is static checking, where potential type failures are detected by an analysis of software texts, without execution. Such an analysis is performed by a mechanism called a static checker, or just checker in the rest of this discussion, which does not consider any form of checking other than static. Formally, a checker is a predicate on software texts - that is to say, a boolean function which, for any such text, expresses whether or not the text is typewise correct. In practice, the only checkers of interest are those which can be implemented by computable algorithms. The practical requirements are even stronger: to be worthy of consideration, the algorithm must be efficient, and it must be possible to integrate it in a compiler for the programming language used to write the software. A full checker is a [static] checker which is able to guarantee that no type failure may result from the execution of certain software systems. The ability to build full checkers depends on the programming language being used. A programming language for which it is possible to write a practical and useful full checker will be said to be fully typed, or just ``typed'' for short. (``Typable'' would be a more accurate term since the use of such a language will only guarantee the absence of type failures if an appropriate checker is applied.) To make full checking possible, typed programming languages require programmers to provide supplementary information in the form of type declarations. Programmers will explicitly declare every program entity, such as a variable or a formal routine argument, as being of a certain type, identified by a type name. The static checker can then check that every use of the entity is consistent with this declaration. In Pascal, for example, an entity declared as integer may only participate in operations such as addition or comparison, which are applicable to integer arguments. 2.2 The typing problem in object-oriented programming With the exception of Trellis-Owl [8], most object-oriented languages other than Eiffel do not support static typing. Some, such as Smalltalk and Lisp derivatives have followed the route of dynamic typing. C extensions such as C++ include a notion of type, but defeat any attempt at static checking by providing loopholes (``type casts'') inherited from C; these loopholes are required for any realistic use of such a language, in particular because of the absence of genericity (as discussed below). Such a lax attitude towards typing is all the more regrettable that static checking is particularly desirable in an object-oriented context. The fundamental construct in object-oriented programming is the application of an operation to an object; this is written entity.feature in Eiffel, meaning ``Apply the operation denoted by feature to the object denoted by entity''. (Arguments may be given if feature is a routine.) This will cause a type failure if no such operation is applicable to the given object. Under its basic form, the static typing problem for Eiffel is simple: this is the problem of statically guaranteeing that such a type failure will never occur. (A more complete description of the problem will be given below.) In other words, the basic problem is the possibility of writing a checker which, whenever it accepts a set of classes containing a case of the notation entity.feature, guarantees that the run-time support mechanism will always be able to find an operation for feature, applicable to the object associated with entity. The problem might appear trivial in any language that requires every entity x to be explicitly declared of a certain type T, defined by a class: then f must be an exported feature of that class. The reason things are not that simple is the presence of two powerful mechanisms: polymorphism, which at run time allows x to denote an object of a type T', possibly different from T and statically unpredictable; and dynamic binding, which implies that in such a case the version of f is the version applicable to instances of T', which is not necessarily the same as the version for T. (Dynamic binding should not be confused with dynamic typing; in fact the Eiffel combination of static typing and dynamic binding is a particularly productive one.) 2.3 Limits of typing Before analyzing in more detail how Eiffel addresses the typing problem, it is important to consider three practical aspects of typing, which are a consequence of the preceding discussion. The first observation is that static type checking cannot guarantee that no run-time failure will occur: it will only remove what has been called ``type failures'' at the beginning of this discussion. Take for example a typed language where the division operation expects arguments of type REAL. Then the language and its checkers will flag a/b as illegal if b is a variable of type CHARACTER; but if both operands are of the correct type, the expression will be considered legal even though the value of b might be zero at execution time, producing a failure. A similar problem exists in the type system of object- oriented programming: a type failure has been defined above as a case in which entity.feature is executed when entity denotes an object which has no operation for feature. But all that a type checker for Eiffel will be able to guarantee is that if the object exists, there will be an operation feature applicable to the object denoted by entity. This will cause a type failure if no such operation is applicable to the given object. This aspect of the typing problem is frustrating because it means that type checking may be used to catch the ``easy'' errors, such as using a character as denominator in a division, but not the ``hard'' ones, such as using a real denominator which in some cases may be zero. Unfortunately, to catch the hard errors statically would require formal proofs of program correctness, a goal that, even in decidable cases, is beyond today's technology (although progress towards this goal is definitely possible in Eiffel thanks to the assertion mechanism, but this is beyond the present discussion). So type checking, which may be called the ``poor man's proof'', only addresses errors that may be statically detected through relatively straightforward mechanisms. This observation does not mean that type checking is useless, if only because the ``easy'' errors that it detects are among the most important ones in practice, reflecting inconsistencies which often result from serious design mistakes. But it reminds us that type checking is only a partial answer to the problem of program correctness, and that the distinction between the errors which are detectable by type checking and those which are not is a pragmatic one, having to do with the power of the detection mechanisms available to us. A program that has been type-checked and found typewise correct is guaranteed to execute without type failure; it is not guaranteed to execute without failure. In Eiffel, the remaining failures will cause exceptions, which lead to abnormal termination unless the application deals with them explicitly. 2.4 Cost issues In the preceding discussion two kinds of errors were considered: ones that are very difficult or impossible to detect in the current state of the technology; others that are relatively easy for a compiler to find in a language designed for static typing. In practice the situation may not be so clearcut. Some checks may be tractable in principle but costly to implement as part of standard compiling mechanisms. In traditional languages, this is often the case with inter-procedural checks. A frequent solution is to provide a static checker (such as the lint tool for C), which is separate from the compiler and performs more advanced checks. One may contend, of course, that the need for such tools is a consequence of poorly designed type systems and in many cases it is. But the underlying idea is not without merit. Checking implies definite costs in time. If one is able to classify errors according to their practical probability or the damages they cause, then the benefit-to- cost ratio of static checks will usually show a point of diminishing returns. Then it may make sense to have the compiler, under its default options, perform the checks with the highest benefit-to-cost ratio, the others being carried out by explicit request only. A related practical requirement arises in the case of Eiffel, a modular language meant for incremental compilation. In any good compiled implementation of Eiffel, classes are compiled separately; if a class C has been type-checked and compiled, a programmer who writes a new class D depending on C must clearly be able to check D without having to check C again. (Dependents in Eiffel include clients, which have access to the class through its interface only, and heirs.) This implies a strong constraint on the type system, which must be designed in such a way that the addition of a new descendant never makes an existing class type-wise incorrect. Satisfying this requirement means that it will be possible to integrate full type checking with the automatic incremental compilation mechanism used in the current Eiffel implementation, which after a change recompiles only the minimum set of impacted classes, taking export controls into account (clients of a class which has undergone changes affecting only hidden features are not recompiled). 2.5 Pessimism The third important observation is that type checking is always a pessimistic policy. Pascal compilers, for example, will reject a program extract of the form procedure p(var i: integer; r: real; var b: BOOLEAN); begin if b then i := r else i := i + 1 end because the first branch of the conditional attempts to assign a real value to an integer variable (a forbidden combination). Yet in a given program it may very well be the case that the procedure is always called with b false, so that no type failure would result. The problem is not just that a program should only be accepted if it cannot ever produce a type failure. If the instruction b := false is inserted at the beginning of the above procedure, no execution of this procedure may ever produce a type failure. Yet all Pascal compilers will still reject the above procedure, because they do not perform the kind of flow analysis that would be necessary in the general case to discover that the type failure may not occur. The behavior of compilers (and designers of statically typed languages) in such cases is easy to understand. Compiling time is a precious resource and should not be wasted in trying to salvage programs that look hopelessly suspicious, even if once in a while one of these programs would in fact work. The principle followed here is ``better safe than sorry''. If the checker is to err, then it should err on the safe side by rejecting some programs that would work, rather than accepting ones that will not. So in all non-trivial cases typing is pessimistic. This means that every language is typed in a way: a universal full checker (applicable to Smalltalk, Lisp and APL as well as Basic) is the program print "Program rejected" This example is ludicrous, of course, but shows that typing is a relative, not an absolute notion. The above definition of a typed language should thus be amended: A programming language is fully typed, or just typed, if it is possible to write a practical and useful full checker which will not reject any reasonable program. This definition leaves some margin of appreciation as to what is a ``reasonable'' program, a notion that may not be defined rigorously, although ``reasonable'' people should be able to agree on whether a specific kind of program is reasonable or not. 2.6 The role of typing: an assessment The common lesson from the above three comments on typing is that in any full discussion of typing the pragmatic considerations are as important as the theoretical discussion. The three key questions of typing for a programming language are: 1 + Among all possible run-time failures, which ones are to be considered type failures, to be totally avoided by the checking mechanism, and which ones should be considered as non-type failures (as division by zero)? 2 + Among all possible checks, which ones should always be performed by the compiler, and which ones (if any) should be done on request only? 3 + Given that the checker will reject all programs containing potential type failures as defined under 1, how far can it go in rejecting programs that might not lead to such failures if executed? The answers to all of these questions must of necessity rely on an assessment of highly practical issues. For question 1, the issue is what checks can realistically be performed by a compiler or other checker. This requires familiarity with the current technology of compilers and other software tools. For question 2, the issue is the benefit-to-cost ratio of various checks. This adds to the previous requirement some knowledge of what errors programmers actually make and their actual degrees of gravity. For question 3, the issue is what kinds of potentially correct programs may be rejected by the checker. This demands a practical understanding of how programmers use the language and what is essential for them. What is required in all cases is a software engineering view which, beyond the language and its theory, encompasses its practical application to the construction of useful software. 3 THE EIFFEL TYPING PROBLEM 3.1 Terminology It is assumed that the reader is familiar with Eiffel and has read [5] or at least the 12-page introduction in [7]. Any meaningful discussion of type issues in Eiffel requires using the proper terminology, which will be recalled here. Software texts in Eiffel are made of classes. An executable software product is obtained by assembling a system, or combination of one or more classes, one of which is called the root of the system. A class describes a number of potential run-time objects, called its instances. A class is characterized by a number of features. A feature is either an attribute (describing a modifiable component present in each instance of the class) or a routine (describing a computation applicable to each instance of the class). A routine is either a function, which returns a result, or a procedure, which does not. A routine may have formal arguments; if so, calls to the routines must include the corresponding actual arguments (expressions). An entity is a name in a class text which stands for a value at run-time. For this discussion we ignore entities of ``expanded types'' which include basic types (integer, real, boolean, character) and for which polymorphism is not supported; so the only values of interest are those whose run-time values are references. A reference either points to an object or is void. The following forms of entities are provided: + Attributes of a class. + Local variables in routines. + The predefined entity Result which can only appear in a function. (The value returned by a function call is the final value of Result on termination of the function's execution.) + Formal arguments of a routine. An attribute or local variable may be used as target of an assignment, written x := y where x is an entity and y is an expression. The effect of such an assignment is to make x void if the value of y is void, and otherwise to make x refer to the same object as y. The latter case produces aliasing between x and y; two entities are said to be aliased if there values are references to the same object. Aliasing also may also result from a routine call where y is the actual argument and x the corresponding formal argument. What makes the object-oriented typing problem non- trivial is the presence of polymorphic aliasing, where x and y are of different class types. The typing problem is then that the same object is accessible through names of two different types, which may have different rules with respect to what features are exported and what feature arguments are permissible. 3.2 The basic type rules Two fundamental type rules (which have always been statically enforced by our implementation of Eiffel) serve to avoid most of the cases that might lead to a type failure. They are given in sections 10.1.4 and 11.3 of [5] and repeated here in simplified form. The first rule governs feature application: x.f (...), where x is of class type C, is only legal if one of the ancestors of C (possibly C itself) defines a feature f, and C exports f. The second rule governs polymorphic aliasing: x := y is only legal if the type of y say D, conforms to C. (For non-generic classes, this means that D is a descendant of C; for generic classes, conformance also includes recursive conditions on the generic parameters.) This also applies to argument passing if x is the formal argument and y the corresponding actual argument. This second rule is also what lies behind the possibility of defining flexible generic data structures such as stack_of_C: STACK [C] with the effect that the push routine applied to stack_of_C will accept as argument any expression whose type conforms to C. Taken together, the two rules exclude most type errors. Nevertheless some type failures are still possible in principle. 3.3 Violations Type failures may occur in Eiffel in spite of the above rules because of the following two properties of the language: + The information hiding mechanism is orthogonal to the inheritance mechanism. In other words, a feature hidden (exported) by a class may be exported (hidden) by any of its proper descendants. + When a routine is redefined in a descendant, the type of a formal argument may be redefined to be a proper descendant of the original type. This is sometimes called the covariant rule. Both of these rules makes it easy to construct examples that lead to type failures. Let PARENT be a class and HEIR one of its heirs. Assume PARENT exports feature f, but HEIR does not. Then the first type rule makes the following legal: /1/ p: PARENT; h: HEIR; ... h.Create; p := h; p.f Here because of dynamic binding we are sneakily applying operation f to an object of type HEIR, even though the designer of class HEIR has expressly removed f from the list of operations applicable to such objects. Why this is indeed bad will be analyzed in more detailed below. As an example of the second kind, assume that routine g is defined in PARENT with an argument of type PARENT, and redefined in HEIR with an argument of type HEIR. (More generally the respective argument types might be any P and H such that H conforms to P; here for simplicity we have chosen P and H to be the enclosing classes themselves. In practice, explicit redeclaration is avoided in such a case by using a ``declaration by association'' of the form like Current.) Then the second type rule makes the following legal: /2/ p: PARENT; h: HEIR; ... h.Create; p := h; p.g (p) Here we are sneakily calling g on an object of type HEIR with an argument of dynamic type PARENT, not HEIR. 3.4 Analysis of failures Why are the above situations examples of type failures? In case /2/, the problem is clear: since the version of g redefined in HEIR assumes an argument of type HEIR, it might well access an attribute defined for objects of type HEIR but not for those of type PARENT. Such a type failure will usually trigger a run-time exception. Case /1/ is more subtle, as will be seen if we compare it to the following almost identical variant, which assumes the reverse export policy: f exported by HEIR but not by PARENT. /3/ p: PARENT; h: HEIR; ... h.Create; p := h; h.f In /3/ to we again apparently use aliasing (the ability to refer to an object through more than one name) as a way to cheat with export restrictions. Even though p.f is illegal, we can apply f to the object associated with p by accessing it through its HEIR name. In /3/, however, nothing bad can really occur; because the designer of HEIR decided to export f, it means that (if the class is correct) f may be correctly applied to objects of this type. In precise terms, f, being exported by HEIR, must preserve the invariant of this class. (No discussion of these issues makes sense unless it refers to the notion of class invariant.) For /1/ the situation is different. Feature f is exported by PARENT, meaning that it preserves the invariant of this class. But it may well fail to preserve the invariant of HEIR (this by itself being a sufficient reason for HEIR not to export f). By using polymorphism and dynamic binding, the client class containing extract /1/ succeeds in applying f to an object of type HEIR; as a result, the object may enter into an incorrect state (which usually will soon produce an exception). Cases /1/ and /2/ are prototypical examples of the bad cases that may occur. Others (involving in particular genericity and declaration by association) are mere variations on these basic patterns. /3/, as noted above, does not lead to type failures. For all their obviousness, these cases very seldom occur in practice in a project whose aim is to construct useful software rather than to crash the Eiffel environment. For example, it is interesting to mention that no type failure has ever been recorded in the usage of Eiffel at Interactive Software Engineering. This explains why fixing the type rules and the corresponding type checking performed by the compiler has not been a top priority. But of course the loopholes must eventually be closed. THIS IS THE END OF PART 1 (OF 2) -- -- Bertrand Meyer bertrand@eiffel.com