Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!tut.cis.ohio-state.edu!ucsd!hub!eiffel!bertrand
From: bertrand@eiffel.UUCP (Bertrand Meyer)
Newsgroups: comp.lang.eiffel
Subject: Static typing for Eiffel
Keywords: The full paper (at long last) - PART 1 (of 2).
Message-ID: <176@eiffel.UUCP>
Date: 10 Jul 89 22:27:04 GMT
Organization: Interactive Software Engineering, Santa Barbara CA
Lines: 661


This article is the discussion on Eiffel types
which has been announced for a rather long time.

I apologize for the delay. Finding the few hours needed to
finish this paper turned out to be more difficult than I
expected when I announced its coming availability in
January after completing the first draft.
The preparation of version 2.2 of the Eiffel system, addressing
problems that are more relevant to Eiffel use, seemed to command
higher priority.

All those people (several hundred) who requested a paper
version back in February will receive one. The mailing won't
be sent until early August.
 

IMPORTANT: The form below is still a draft.
 
-------------------------------------------------------

Warning:

This text has been designed with laser printing in mind
(fonts, subscripts), and some parts may be hard to read
without these. Sorry.

-------------------------------------------------------

Because its size is > 50K, this article has been split
into two parts for posting to Usenet. Please make sure you have
both parts. 


PART 1 OF 2
 

                  STATIC TYPING FOR EIFFEL

                       Bertrand Meyer

            January 1989 (updated, July 2-4, 1989)


1 OVERVIEW


The object-oriented language Eiffel [5] is statically typed:
in   other  words,  type  errors  may  be  catched  by  mere
examination of software texts.

     Unfortunately, the description of the basic type  rules
in  [5],  as  well  as  the  type  checking performed by the
current Eiffel implementations, did not cover certain  kinds
of   type   errors.   These   errors  have  two  interesting
properties:  they  almost  never  occur  in  the   practical
development   of   software  systems  if  proper  management
procedures  are  observed;  yet  it  is  easy,  even  for  a
newcomer,  to  come up with artificial examples that exhibit
some of these errors.

     In recent months several people have independently come
across  these problems and in some cases have even published
their findings as evidence of purported flaws in the  Eiffel
type system.

     This article shows that  the  type  policy  adopted  by
Eiffel  is  sound,  and  actually claims that it is the only
policy compatible with Eiffel's  aim  to  provide  a  clean,
powerful  basis for the practical development of medium- and
large-scale software systems in industrial environments.

     The article gives the precise rule for static typing in
Eiffel and describes the strategy for implementing this rule
as part of the incremental, modular compilation method  used
for the implementation of Eiffel. Lack of time has prevented
the full inclusion of these  checks  in  release  2.2;  they
will, however, be part the next release (3).

     Most of the arguments that have been voiced in previous
discussions  are  theoretical.  In  contrast,  this  article
considers in detail the pragmatic aspects of  using  classes
and  inheritance  in  an  industrial  environment.  A purely
theoretical  approach  misses   the   software   engineering
criteria without which the entire discussion is meaningless.

     Section 2 discusses the notion of typing in  a  general
context.  Section  3  explains what typing and type failures
mean for Eiffel. Section 4 introduces the  rationale  behind
the  Eiffel  type  policy.  Section 5 gives the precise type
rule. Section 6 outlines the  strategy  for  enforcing  this
rule  as part of Eiffel compiling. Section 7 is a conclusion
and assessment.


2 BACKGROUND


During the execution of a computer program, an  attempt  may
be  made  to apply an operation to operands which are not of
the form,  or  type,  expected  by  the  operation.  Such  a
regrettable  course of affairs may be called a type failure.
The rest  of  this  article  discusses  how  to  avoid  type
failures in object-oriented programming.

2.1 Checking types


One  possible  defense  against  type  failures  is  dynamic
checking,  whereby  the  applicability  of  an  operation to
candidate operands is ascertained at run-time,  just  before
the  operation  is  to be applied. For anyone concerned with
software reliability, however, this is  not  a  satisfactory
solution,  since  run-time  is  too  late  a stage to find a
suitable recovery from a type failure.  A better strategy is
static  checking, where potential type failures are detected
by an analysis of software texts, without  execution.   Such
an  analysis  is  performed  by  a mechanism called a static
checker, or just checker in the  rest  of  this  discussion,
which  does  not  consider  any  form of checking other than
static.

     Formally, a checker is a predicate on software texts  -
that is to say, a boolean function which, for any such text,
expresses whether or not the text is  typewise  correct.  In
practice,  the only checkers of interest are those which can
be  implemented  by  computable  algorithms.  The  practical
requirements   are   even   stronger:   to   be   worthy  of
consideration, the algorithm must be efficient, and it  must
be   possible   to  integrate  it  in  a  compiler  for  the
programming language used to write the software.

     A full checker is a [static] checker which is  able  to
guarantee that no type failure may result from the execution
of certain software systems.   The  ability  to  build  full
checkers  depends  on the programming language being used. A
programming language for which it is  possible  to  write  a
practical  and  useful full checker will be said to be fully
typed, or just ``typed'' for short.  (``Typable'' would be a
more  accurate  term  since  the use of such a language will
only  guarantee  the  absence  of  type   failures   if   an
appropriate checker is applied.)

     To  make  full  checking  possible,  typed  programming
languages   require  programmers  to  provide  supplementary
information in the form of  type  declarations.  Programmers
will  explicitly  declare  every  program  entity, such as a
variable or a formal routine argument, as being of a certain
type,  identified  by  a  type name.  The static checker can
then check that every use of the entity is  consistent  with
this declaration. In Pascal, for example, an entity declared
as integer  may  only  participate  in  operations  such  as
addition  or  comparison,  which  are  applicable to integer
arguments.

2.2 The typing problem in object-oriented programming


With the exception of Trellis-Owl [8], most  object-oriented
languages  other  than  Eiffel do not support static typing.
Some, such as Smalltalk and Lisp derivatives  have  followed
the  route  of  dynamic  typing.  C  extensions  such as C++
include a notion of type, but defeat any attempt  at  static
checking  by  providing loopholes (``type casts'') inherited
from C; these loopholes are required for any  realistic  use
of  such a language, in particular because of the absence of
genericity  (as  discussed  below).   Such  a  lax  attitude
towards  typing  is  all  the  more  regrettable that static
checking is particularly  desirable  in  an  object-oriented
context.

     The   fundamental    construct    in    object-oriented
programming is the application of an operation to an object;
this is written entity.feature in  Eiffel,  meaning  ``Apply
the  operation  denoted  by feature to the object denoted by
entity''. (Arguments may be given if feature is a  routine.)
This  will  cause  a  type  failure  if no such operation is
applicable to the given object.

     Under its basic form, the  static  typing  problem  for
Eiffel   is  simple:  this  is  the  problem  of  statically
guaranteeing that such a type failure will never  occur.  (A
more  complete  description  of  the  problem  will be given
below.)

     In other words, the basic problem is the possibility of
writing  a  checker  which,  whenever  it  accepts  a set of
classes containing a case of  the  notation  entity.feature,
guarantees  that  the run-time support mechanism will always
be able to find an operation for feature, applicable to  the
object associated with entity.

     The problem might appear trivial in any  language  that
requires  every  entity  x  to  be  explicitly declared of a
certain type T, defined by  a  class:  then  f  must  be  an
exported  feature  of that class.  The reason things are not
that simple is the  presence  of  two  powerful  mechanisms:
polymorphism, which at run time allows x to denote an object
of a type T',  possibly  different  from  T  and  statically
unpredictable;  and  dynamic  binding, which implies that in
such a case the version of f is the  version  applicable  to
instances  of  T',  which is not necessarily the same as the
version for T.  (Dynamic binding should not be confused with
dynamic  typing;  in  fact  the Eiffel combination of static
typing and dynamic  binding  is  a  particularly  productive
one.)

2.3 Limits of typing


Before analyzing in more detail  how  Eiffel  addresses  the
typing  problem, it is important to consider three practical
aspects of typing, which are a consequence of the  preceding
discussion.

     The first observation  is  that  static  type  checking
cannot  guarantee  that  no  run-time failure will occur: it
will only remove what has been called ``type  failures''  at
the  beginning  of this discussion. Take for example a typed
language where the division operation expects  arguments  of
type REAL.  Then the language and its checkers will flag a/b
as illegal if b is a variable of type CHARACTER; but if both
operands  are  of  the  correct type, the expression will be
considered legal even though the value of b might be zero at
execution time, producing a failure.

     A similar problem exists in the type system of  object-
oriented  programming: a type failure has been defined above
as a case in which entity.feature is  executed  when  entity
denotes  an  object  which has no operation for feature. But
all that a type checker for Eiffel will be able to guarantee
is  that  if  the  object exists, there will be an operation
feature applicable to the object  denoted  by  entity.  This
will cause a type failure if no such operation is applicable
to the given object.

     This  aspect  of  the  typing  problem  is  frustrating
because it means that type checking may be used to catch the
``easy'' errors, such as using a character as denominator in
a  division, but not the ``hard'' ones, such as using a real
denominator which in some cases may be zero.  Unfortunately,
to  catch  the  hard  errors statically would require formal
proofs  of  program  correctness,  a  goal  that,  even   in
decidable  cases,  is  beyond  today's  technology (although
progress towards this goal is definitely possible in  Eiffel
thanks  to  the  assertion mechanism, but this is beyond the
present discussion).  So type checking, which may be  called
the  ``poor man's proof'', only addresses errors that may be
statically  detected  through   relatively   straightforward
mechanisms.

     This observation does not mean that  type  checking  is
useless, if only because the ``easy'' errors that it detects
are among the most important ones  in  practice,  reflecting
inconsistencies  which  often  result  from  serious  design
mistakes.  But it reminds us that type checking  is  only  a
partial  answer  to  the problem of program correctness, and
that the distinction between the errors which are detectable
by type checking and those which are not is a pragmatic one,
having to do with the  power  of  the  detection  mechanisms
available to us.

     A program that has been type-checked and found typewise
correct is guaranteed to execute without type failure; it is
not guaranteed to execute without failure.

     In  Eiffel,   the   remaining   failures   will   cause
exceptions,  which  lead  to abnormal termination unless the
application deals with them explicitly.

2.4 Cost issues


In  the  preceding  discussion  two  kinds  of  errors  were
considered:  ones  that  are very difficult or impossible to
detect in the current state of the technology;  others  that
are  relatively  easy  for  a compiler to find in a language
designed for static typing.

     In practice the situation may not be so clearcut.  Some
checks may be tractable in principle but costly to implement
as part of standard  compiling  mechanisms.  In  traditional
languages,  this  is  often  the  case with inter-procedural
checks. A frequent solution is to provide a  static  checker
(such  as  the  lint tool for C), which is separate from the
compiler and performs more advanced checks.

     One may contend, of course,  that  the  need  for  such
tools  is  a consequence of poorly designed type systems and
in many cases it is. But the underlying idea is not  without
merit.  Checking  implies  definite costs in time. If one is
able  to  classify  errors  according  to  their   practical
probability  or the damages they cause, then the benefit-to-
cost ratio of static checks will usually  show  a  point  of
diminishing  returns.  Then  it  may  make sense to have the
compiler, under its default options, perform the checks with
the  highest benefit-to-cost ratio, the others being carried
out by explicit request only.

     A related practical requirement arises in the  case  of
Eiffel,   a   modular   language   meant   for   incremental
compilation. In any good compiled implementation of  Eiffel,
classes  are  compiled  separately;  if  a  class C has been
type-checked and compiled, a programmer  who  writes  a  new
class  D  depending  on  C  must  clearly be able to check D
without having to check  C  again.   (Dependents  in  Eiffel
include  clients, which have access to the class through its
interface only, and heirs.) This implies a strong constraint
on  the  type  system,  which must be designed in such a way
that the  addition  of  a  new  descendant  never  makes  an
existing class type-wise incorrect.

     Satisfying this  requirement  means  that  it  will  be
possible  to integrate full type checking with the automatic
incremental compilation mechanism used in the current Eiffel
implementation,  which  after  a  change recompiles only the
minimum set of impacted classes, taking export controls into
account  (clients  of  a  class  which has undergone changes
affecting only hidden features are not recompiled).

2.5 Pessimism


The third important observation is  that  type  checking  is
always  a pessimistic policy. Pascal compilers, for example,
will reject a program extract of the form
     procedure p(var i: integer; r: real; var b: BOOLEAN);
          begin
               if b then i := r else i := i + 1
          end

because the first branch  of  the  conditional  attempts  to
assign  a  real  value  to  an integer variable (a forbidden
combination). Yet in a given program it may very well be the
case  that  the  procedure is always called with b false, so
that no type failure would result.

     The problem is not just that a program should  only  be
accepted  if  it  cannot ever produce a type failure. If the
instruction
     b := false

is inserted at the beginning  of  the  above  procedure,  no
execution of this procedure may ever produce a type failure.
Yet  all  Pascal  compilers  will  still  reject  the  above
procedure,  because  they  do  not  perform the kind of flow
analysis that would be necessary  in  the  general  case  to
discover that the type failure may not occur.

     The behavior of compilers (and designers of  statically
typed  languages)  in  such  cases  is  easy  to understand.
Compiling time is a precious  resource  and  should  not  be
wasted  in  trying  to salvage programs that look hopelessly
suspicious, even if once in a while one  of  these  programs
would  in fact work. The principle followed here is ``better
safe than sorry''. If the checker is to err, then it  should
err  on  the safe side by rejecting some programs that would
work, rather than accepting ones that will not.

     So in all non-trivial cases typing is pessimistic. This
means  that  every  language  is typed in a way: a universal
full checker (applicable to Smalltalk, Lisp and APL as  well
as Basic) is the program
     print "Program rejected"

     This example is ludicrous, of course,  but  shows  that
typing  is  a  relative,  not  an absolute notion. The above
definition of a typed language should  thus  be  amended:  A
programming language is fully typed, or just typed, if it is
possible to write a practical and useful full checker  which
will not reject any reasonable program.

     This definition leaves some margin of  appreciation  as
to  what  is a ``reasonable'' program, a notion that may not
be defined rigorously, although ``reasonable'' people should
be  able  to  agree on whether a specific kind of program is
reasonable or not.

2.6 The role of typing: an assessment


The common lesson from the above three comments on typing is
that   in  any  full  discussion  of  typing  the  pragmatic
considerations  are  as   important   as   the   theoretical
discussion.

     The three key questions of  typing  for  a  programming
language are:

1
     + Among all possible run-time failures, which ones  are
       to be considered type failures, to be totally avoided
       by the checking mechanism, and which ones  should  be
       considered  as  non-type  failures  (as  division  by
       zero)?

2
     + Among all possible checks, which ones  should  always
       be performed by the compiler, and which ones (if any)
       should be done on request only?

3
     + Given that  the  checker  will  reject  all  programs
       containing  potential  type failures as defined under
       1, how far can it go in rejecting programs that might
       not lead to such failures if executed?

     The answers to all of these questions must of necessity
rely on an assessment of highly practical issues.

     For  question  1,  the  issue  is   what   checks   can
realistically  be  performed by a compiler or other checker.
This requires familiarity with  the  current  technology  of
compilers and other software tools.

     For question 2, the issue is the benefit-to-cost  ratio
of  various  checks.   This adds to the previous requirement
some knowledge of what errors programmers actually make  and
their actual degrees of gravity.

     For question 3, the issue is what kinds of  potentially
correct  programs  may  be  rejected  by  the  checker. This
demands a practical understanding of how programmers use the
language and what is essential for them.

     What is required in all cases is a software engineering
view  which, beyond the language and its theory, encompasses
its practical application  to  the  construction  of  useful
software.


3 THE EIFFEL TYPING PROBLEM

3.1 Terminology


It is assumed that the reader is familiar  with  Eiffel  and
has  read  [5]  or at least the 12-page introduction in [7].
Any meaningful discussion of type issues in Eiffel  requires
using the proper terminology, which will be recalled here.

     Software texts in  Eiffel  are  made  of  classes.   An
executable  software  product  is  obtained  by assembling a
system, or combination of one or more classes, one of  which
is called the root of the system.

     A  class  describes  a  number  of  potential  run-time
objects,  called its instances.  A class is characterized by
a number of features.  A  feature  is  either  an  attribute
(describing  a modifiable component present in each instance
of  the  class)  or  a  routine  (describing  a  computation
applicable to each instance of the class).

     A routine is either a function, which returns a result,
or  a  procedure,  which does not. A routine may have formal
arguments; if so, calls to the  routines  must  include  the
corresponding actual arguments (expressions).

     An entity is a name in a class text which stands for  a
value at run-time. For this discussion we ignore entities of
``expanded types'' which include basic types (integer, real,
boolean,  character)  and  for  which  polymorphism  is  not
supported; so the only values of interest  are  those  whose
run-time values are references. A reference either points to
an object or is void.  The following forms of  entities  are
provided:

     + Attributes of a class.

     + Local variables in routines.

     + The predefined entity Result which can only appear in
       a  function.   (The value returned by a function call
       is the final value of Result on  termination  of  the
       function's execution.)

     + Formal arguments of a routine.

     An attribute or local variable may be used as target of
an assignment, written
     x := y

where x is an entity and y is an expression. The  effect  of
such  an  assignment  is to make x void if the value of y is
void, and otherwise to make x refer to the same object as y.
The  latter  case  produces  aliasing  between  x and y; two
entities  are  said  to  be  aliased  if  there  values  are
references  to  the  same  object.   Aliasing  also may also
result from a routine call where y is  the  actual  argument
and x the corresponding formal  argument.

     What makes  the  object-oriented  typing  problem  non-
trivial is the presence of polymorphic aliasing, where x and
y are of different class types. The typing problem  is  then
that  the  same  object  is  accessible through names of two
different types, which may have different rules with respect
to what features are exported and what feature arguments are
permissible.

3.2 The basic type rules


Two  fundamental  type  rules  (which   have   always   been
statically  enforced  by our implementation of Eiffel) serve
to avoid most of  the  cases  that  might  lead  to  a  type
failure.  They  are given in sections 10.1.4 and 11.3 of [5]
and repeated here in simplified form.

     The first rule governs feature application: x.f  (...),
where  x  is  of  class  type C, is only legal if one of the
ancestors of C (possibly C itself) defines a feature f,  and
C exports f.

     The second rule governs polymorphic aliasing: x := y is
only  legal  if  the  type  of y say D, conforms to C.  (For
non-generic classes, this means that D is a descendant of C;
for  generic  classes,  conformance  also includes recursive
conditions on the generic parameters.) This also applies  to
argument  passing  if  x  is  the  formal argument and y the
corresponding actual argument.

     This  second  rule  is  also  what  lies   behind   the
possibility  of  defining  flexible  generic data structures
such as
     stack_of_C: STACK [C]

with the effect that the push routine applied to  stack_of_C
will  accept  as argument any expression whose type conforms
to C.

     Taken together, the two rules exclude most type errors.
Nevertheless  some  type  failures  are  still  possible  in
principle.

3.3 Violations


Type failures may occur in Eiffel  in  spite  of  the  above
rules  because  of  the  following  two  properties  of  the
language:

     + The information hiding mechanism is orthogonal to the
       inheritance  mechanism.  In  other  words,  a feature
       hidden (exported) by a class may be exported (hidden)
       by any of its proper descendants.

     + When a routine is redefined in a descendant, the type
       of  a formal argument may be redefined to be a proper
       descendant of the original type.  This  is  sometimes
       called the covariant rule.

     Both of these rules makes it easy to construct examples
that  lead  to type failures. Let PARENT be a class and HEIR
one of its heirs. Assume PARENT exports feature f, but  HEIR
does  not.   Then  the  first  type rule makes the following
legal:
/1/
     p: PARENT; h: HEIR;
     ...
          h.Create;
          p := h;
          p.f

     Here  because  of  dynamic  binding  we  are   sneakily
applying  operation f to an object of type HEIR, even though
the designer of class HEIR has expressly removed f from  the
list  of operations applicable to such objects.  Why this is
indeed bad will be analyzed in more detailed below.

     As an example of the second kind, assume that routine g
is  defined  in  PARENT with an argument of type PARENT, and
redefined in HEIR with  an  argument  of  type  HEIR.  (More
generally the respective argument types might be any P and H
such that H conforms to  P;  here  for  simplicity  we  have
chosen  P  and  H to be the enclosing classes themselves. In
practice, explicit redeclaration is avoided in such  a  case
by  using  a ``declaration by association'' of the form like
Current.) Then the second  type  rule  makes  the  following
legal:
/2/
     p: PARENT; h: HEIR;
     ...
          h.Create;
          p := h;
          p.g (p)

     Here we are sneakily calling g on  an  object  of  type
HEIR with an argument of dynamic type PARENT, not HEIR.

3.4 Analysis of failures


Why are the above situations examples of type failures?

     In case /2/, the problem is clear: since the version of
g  redefined  in  HEIR  assumes an argument of type HEIR, it
might well access an attribute defined for objects  of  type
HEIR  but  not for those of type PARENT. Such a type failure
will usually trigger a run-time exception.

     Case /1/ is more subtle, as will be seen if we  compare
it  to the following almost identical variant, which assumes
the reverse export policy: f exported by  HEIR  but  not  by
PARENT.
/3/
     p: PARENT; h: HEIR;
     ...
          h.Create;
          p := h;
          h.f

     In /3/ to we again apparently use aliasing (the ability
to  refer  to an object through more than one name) as a way
to cheat  with  export  restrictions.  Even  though  p.f  is
illegal,  we  can apply f to the object associated with p by
accessing it through its HEIR name.

     In /3/, however, nothing bad can really occur;  because
the  designer of HEIR decided to export f, it means that (if
the class is correct) f may be correctly applied to  objects
of  this  type. In precise terms, f, being exported by HEIR,
must preserve the invariant of this class. (No discussion of
these  issues  makes sense unless it refers to the notion of
class invariant.)

     For /1/  the  situation  is  different.  Feature  f  is
exported  by PARENT, meaning that it preserves the invariant
of this class.   But  it  may  well  fail  to  preserve  the
invariant  of HEIR (this by itself being a sufficient reason
for HEIR not  to  export  f).   By  using  polymorphism  and
dynamic  binding,  the  client  class containing extract /1/
succeeds in applying f to an  object  of  type  HEIR;  as  a
result,  the object may enter into an incorrect state (which
usually will soon produce an exception).

     Cases /1/ and /2/ are prototypical examples of the  bad
cases  that  may  occur.  Others  (involving  in  particular
genericity  and  declaration  by   association)   are   mere
variations  on  these  basic  patterns. /3/, as noted above,
does not lead to type failures.

     For all their  obviousness,  these  cases  very  seldom
occur  in  practice  in  a project whose aim is to construct
useful software rather than to crash the Eiffel environment.
For  example,  it  is  interesting  to  mention that no type
failure has ever been recorded in the  usage  of  Eiffel  at
Interactive  Software Engineering.  This explains why fixing
the type rules and the corresponding type checking performed
by the compiler has not been a top priority.

     But of course the loopholes must eventually be closed.


                THIS IS THE END OF PART 1 (OF 2)


-- 

-- Bertrand Meyer
bertrand@eiffel.com