Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!cs.utexas.edu!tut.cis.ohio-state.edu!gem.mps.ohio-state.edu!usc!masotti
From: masotti@usc.edu (Glauco Masotti)
Newsgroups: comp.lang.c++
Subject: EC++.TUTORIAL
Message-ID: <MASOTTI.89Oct19094730@lipari.usc.edu>
Date: 19 Oct 89 16:47:30 GMT
Sender: news@usc.edu
Distribution: comp.lang.c++, comp.object
Organization: University of Southern California
Lines: 682


                A TUTORIAL INTRODUCTION TO
                    EC++: EXTENDED C++

                   by Glauco Masotti(*)

         University of Southern California, Los Angeles

                     September, 1989.

 
(*) The Author is also with Ferrari Engineering, Modena, Italy.


SUMMARY.

   EC++ stands for "Extended C++" and is an attempt to overcome some of the
difficulties encountered working with C++, to make programming large systems
somewhat more enjoyable. EC++ implements some extensions to the C++ language and
provides an environment to support:
 - assertion checking, in the form of function preconditions and postconditions,
and class invariants;
 - parameterized classes;
 - exception handling;
 - garbage collection.  
   The extensions to the language are translated by a preprocessor (which is
itself called EC++) into plain C++. The overall environment, is set up with an
additional small number of files that define some required macros and classes
that will be used by the applications.


MOTIVATIONS: PROBLEMS AND WEAKNESSES OF C++.


Assertions.

   Assertions are essentially logic statements that should be be verified at
certains points in the code. Examples are invariants that must be satisfied by
objects of a class at all times; preconditions that must be satified whenever
a routine (function of any type) is called; postconditions that must
guaranteed to be true after routine completion. A good presentation of the use
of assertions can be found in [2]. No support is provided in C++ to express
assertions in a disciplined way, and no emphasis has been put on them by other
languages, except Eiffel.
   However, judging on my past experience with large software projects, and
the present experience with EC++, assertions are very useful to write correct
and robust software. Assertions have to be considered as an integral part of
the source code.
   First of all, they play the role of documentation of the contract that
controls the relationship between a function and its users: the precondition
binds the users, the postcondition binds the function.  This make clear what
are the assumed working conditions for a function and whose responsibility it
is to provide or ascertain required conditions. Making software coherent with
the specified assertions therefore will require the necessary checks, but also
eliminates wildly diffuse checking tests thoughout the code, which is what
happens when the contract is not specified.
   Second, they enable us to express formal properties of classes and
routines. This use of assertions may help particularly in the high-level
design phase of a system, in combination with the use of abstract or virtual
functions. Details of implementation will be provided only later, but
requirements and properties of fundamental classes and relevant effects of
functions can be defined, through assertions, at earlier stages of design.
   Third, by monitoring assertions at run-time we can verify the correctness
of our code. Many programming errors are pointed out immediately by violations
of some assertion. Therefore we can save considerable time in testing and
debugging.
   Fourth, assertions are a means for guaranteeing coherence among different
modules of a large program. We can use them to ensure non contradictory
results of different functions and proper working conditions for each module.
   

Libraries, reusable modules and parameterized types.

   While it is relatively easy in C++ to write functions of classes that
depend on a single underlying type, e.g. the case of class Triangle and class
Rectangle that inherit from a class Polygon, it is, however difficult to write
functions that depend only on the general concepts of a container (sets,
vectors, lists, tables, etc.) rather than on properties of individual
containers [3].
   This is the basic weakness that prevents writing more general reusable
libraries and modules in C++. If polymorphic parameterized classes were
available, these classes could be written just once, and instantiated to the
actual needed types to provide code sharing and reusability.
   C++ may fake parameterized types using macros ([1] par. 7.3.5), but this
approach doesn't work well except on a small scale. In fact lines of thousands
of characters are generated, by the standard C++ preprocessor, even for simple
classes. Moreover use of macros is cumbersome in discovering compile errors
and in debugging.
   Another approach suggested for use of the GNU C++ Library [5], still places
some burden on the programmer who wants to use generic classes. It also fails
to reuse code, as code is duplicated, instead of resorting to the technique of
using generic interfaces to a common implementation as suggested in [1].


Exception handling.

   During program execution a function may encounter a situation where an
error occurs or something strange happens. If the problem is detected, some
attempt to recover a manageable situation may be done, or otherwise a failure
should be reported. If the problem is not detected or ignored the program may
either crash or report incorrect results. In the most lucky cases, things may
be patched up locally, but in many cases the routine that detects the problem
cannot cope with the abnormal situation locally; an exception detected in one
context may require giving up the current activity and performing some action
in another context.
   A number of ad hoc techniques have been until now devised to deal with
exceptional conditions: use of status variables, returning error values back
along the chain of function calls, use of longjmp to reach a context where
things can be recovered directly. This may involve restoring certain
conditions and the problem is often complicated by memory management. In
particular in C++ we have the problem of calling the destructors for all
created and no more needed objects [3].
   In any case coping with a particular kind of exception is tedious but
feasible, coping with all possible exceptions leads to an explosion in code
size and complexity, and is itself an error-prone activity, that is moreover
difficult to test and debug, for the intrinsic reason that exceptions are
usually unlikely situations. Not coping with exceptions (as is too often done)
leads to programs that crash.  Most of the popular languages however, as well
as C++, don't offer any exception handling mechanism or disciplined procedure.


Automatic storage management.

   C++ inherits its memory management scheme from C. Storage for static and
automatic objects is managed automatically, but object allocated in the free
store must be explicitly deallocated, before the space can be reused. Manual
memory management usually provides efficiency in time and space but is a
burden for the programmer since it increases considerably the complexity of
code and is a common source of tricky bugs.
   Automatic garbage collection has not been put as part of the language in
the belief that it is too much expensive. Some schemes indeed, need to
maintain additional information in order to do garbage collection. Therefore
they place an overhead, both in time and space, in an executing program; the
representation of an object in memory takes more space, and accessing the
objects takes more time than in a language without automatic garbage
collection.
   However as Boehm has demonstrated in his work [4], there are also
conservative schemes that don't need any particular assumption, and that don't
pose serious efficiency problems. The most pertinent observation here is that
as long as we can tolerate virtual memory, we should tolerate automatic
garbage collection as well.
   Run-time and memory utilization however are not the only considerations to
have in mind.  If done at a reasonable cost, automatic garbage collection can
yield substantial benefits in design and development time, as well in
debugging and maintenance effort.  It can also be a necessary key feature, in
the design of general reusable modules, and in simplifying the exception
handling problem.  In fact in the former case without a system taking care of
memory management automatically it would not be possible to put responsibility
for it in either the general modules or their users, and in the latter case it
would not be possible to call all the necessary destructors for the objects
created to cope with an abnormal situation, as seen before.  However if
freeing memory is all that destructors have to do (as is often the case), with
automatic garbage collection we don't need to care about the problem.


THE EC++ SOLUTION.


Assertions.

   In EC++ we can express assertions in the form of class invariants and
function pre and post conditions.  A class invariant is defined like any other
member function, but it uses the name "invariant", which is a reserved word in
EC++. Function pre and post conditions must be specified after the declaration
of a function and prior to its body.
   A function precondition is prefixed by >> (reminiscent of an input condition)
and the conditional expression is enclosed in parentheses. Any valid
conditional expression that may be written at the beginning of the body of the
routine is allowed.
   A function postcondition is prefixed by << (reminiscent of an output
condition) and enclosed in parentheses.  Valid expressions are those that may
be expressed prior to every possible return path from the function. Besides
using the variables that are accessible in the scope of the function, an OLD
specification may be used, to mean the value that a certain expression had
upon entering the routine.
    We will refer to some examples describing some of the member functions of
the following (simplified) class:

typedef void* word; 

class WList
{
                            /* A general container with list-like
                            capabilities, implemented as a dynamic array
                            of elements of the same type.
                            It can contain elements that fit in a word. */
friend class WList_iterator;

public:
    WList();
    void    append(word, int);
    void    insert(int, word, int);
    ...
    void    remove(int, int);
    int     length();
    word&   operator[](int);
    int     invariant()         {return len>=0 && len<=size);}

protected:
    short int   len;
    short int   size;
    word*       root;      /* pointer to a vector of words in the free-store */

private:
    word*   resize(int);
    void    grow_or_create_maybe(int);
    void    shrink_or_delete_maybe(int);
    void    shift_forward_from(int);
    void    shift_backward_after(int);
};

   Memory is allocated in chunks of fixed size, when needed. It is worth
noting at this point the invariant of the class, that asserts the fundamental
property that relates the length of an object (i.e. the number of elements
stored) and its size (the allocated storage). If an "invariant" function is
not defined for a class, and invariant checking is activated, a global
"invariant" function (returning always 1) will be called.

The following examples should explain the admissable syntax for function pre
and post conditions:

word* WList::resize(int newsize)

    >>(root != 0 && size > 0 && newsize > 0)

    <<(newptr != 0 && *newptr == OLD(*root))
{
    word* newptr;
    newptr = new word[newsize];
    word* p; word* q;
    int min = (size<newsize) ? size : newsize;
    for(p=root, q=newptr; p-root < min; p++, q++) *q=*p;
    return newptr;
}

void WList::grow_or_create_maybe(int stp)

    >>(    (root != 0 && size > 0)   || 
        (root == 0 && size == 0) )

    <<( root != 0 && size >= OLD(size) )
{
    if( root == 0 ) {                        
                            /* Create!  */
        root = new word[stp];
        size=stp;
    }
    else if( len+1 > size ) {
                            /* Space too small => resize */
        root = resize(size+stp);
        size+=stp;
    }
    len++;
}

   Given the previous definition of the function "resize" the preprocessor produces 
C++ code, that (omitting the #line directives and other details) looks like this:

#define CONTEXT__ "word* WList::resize(int newsize)"

word* WList::resize(int newsize)
{
    PRECONDITION(root != 0 && size > 0 && newsize > 0)
    DECLARE_OLDS( \
    typeof(*root) old__root = *root; \
    )
    word* newptr;
    newptr = new word[newsize];
    word* p; word* q;
    int min = (size<newsize) ? size : newsize;
    for(p=root, q=newptr; p-root < min; p++, q++) *q=*p;
    {
    POSTCONDITION(newptr != 0 && *newptr == old__root)
    INVARIANT()
    return newptr; }
}

   The macros are defined in an include file "globals/ASSERTIONS.h" in such a
way to be expanded as follows if the corresponding condition checking is
requested:

#define PRECONDITION( CONDITION ) \
    if( !(CONDITION) ) badNews(NOT_PRECOND, CONTEXT__);

#define DECLARE_OLDS( OLDS ) \
    OLDS 

#define INVARIANT() \
    if( !invariant() ) badNews(NOT_INVAR, CONTEXT__);

#define POSTCONDITION( CONDITION ) \
    if( !(CONDITION) ) badNews(NOT_POSTCOND, CONTEXT__); \
    INVARIANT();

   One has separate control on each case defining the macro names:
       CHECK_PRECONDITION CHECK_POSTCONDITION CHECK_INVARIANT 
   If these macros are not defined the corresponding condition checking macro
is expanded in a blank, and therefore no code is generated.
   If some assertion checking is activated and an assertion fails to be
verified, the function "badNews" is called. In the simplest case this function
limits itself to printing some error information along with the name of the
function where the failure has been detected (this is passed in the variable
CONTEXT__), but in general it may do better if an exception handling procedure
is defined, as we will see later.


Support for parametric classes.

   Given the previous definition of the container class WList, a parametric
interface to it may be provided to use the container for different kind of
objects, following the scheme at par. 7.3.5 in [1]. The generic interface
class may be written in dependence of a parameter type <T> as follows:

#typedef
#include "containers/WList.h"

extern int      step<T>;
extern void     initializeWList<T>(int);

struct WList<T> : WList 
{
                            /* A generic parameterized container 
                            with list-like capabilities, implemented as a 
                            dynamic array of elements of type <T>.
                            Elements of type <T> must fit in a word. */
friend class WList_iterator<T>;

    WList<T>() : WList() {}
    void    append(<T> w) 
                {WList::append((word)w, step<T>);}
    void    insert(int i, <T> w)
                {WList::insert(i, (word)w, step<T>);}
    ...        ...
    void    remove(int i)
                {WList::remove(i, step<T>);}
    int     length()
                {return WList::length();}
    <T>&    operator[](int i)
                {return (<T>&) WList::operator[](i);}
};


   Note the "#typedef" keyword, that will be used later in the instantiation
of this parametric definition, and the use of generic identifiers, depending
on the parameter <T>.
   This generic class may be used as follows:


#include <String.h>
#include "generics/WList<String*>.h"

struct Sentence : WList<String*>
{
    Sentence();
    void input_sentence();
    void print_sentence();
    ...
private:
    void print_current(String*);
};


Sentence::Sentence() : WList<String*>() {};

void Sentence::input_sentence() 
{
    String* s;
    do {
        ...
        append(s);

    } while(*s != ".");
    remove(len-1);
}

   Parametric classes may therefore be instantiated to a particular type and
parametric identifiers may be used. An arbitrary type (given certain
constraints) may be used and placed between "<...>" in all the places where a
match with the corresponding generic parameter <T> in the generic class is
possible.
   In order to get this working no manual operation is necessary, the
preprocessor of EC++ will take care of all that is needed:

   1) First of all the conventional expressions of a parametric identifier
must be expanded in the source files that use parametric classes, in order to
obtain valid C++ identifiers for the parametric names.  In our case the source
files are transformed this way:


#include <String.h>
#include "generics/WListString_.H"

struct Sentence : WListString_
{
    Sentence();
    void input_sentence();
    void print_sentence();
    ...
private:
    void print_current(String*);
};


Sentence::Sentence() : WListString_() {};

void Sentence::input_sentence() 
{
    String* s;
    do 
    {
        ...
        append(s);

    } while(*s != ".");
    remove(len-1);
}


The substring "String_" take the place of "<String*>" in the parametric names,
and is concatenated with the rest of the name.
   As is shown in the example, besides simple types, pointer types may
constitute valid parameter types for EC++. In the actual implementation more
complex parameter type will require a typedef statement, but these should be
quite uncommon cases.

   2) We have to generate the files that define the required instance of the
generic class. In the previous example these files are
"generics/WListString_.h" and "generics/WListString_.cc" (the suffixes .H and
.C will then placed by the final preprocessing step). In our example the code
of the instantiated interface class, generated by the preprocessor, is:


// WListString_.h:

typedef String* String_;
#include "containers/WList.h"
extern int      stepString_;
extern void     initializeWListString_(int);

struct WListString_ : WList 
{
                            /* A container with list-like capabilities, 
                            implemented as a dynamic array of elements 
                            of type String_.
                            Type String_ must fit in a word. */
friend class WList_iteratorString_;

    WListString_() : WList() {}
    void     append(String_ w) 
                {WList::append((word)w, stepString_);}
    void     insert(int i, String_ w)
                {WList::insert(i, (word)w, stepString_);}
    ...        ...
    void     remove(int i)
                {WList::remove(i, stepString_);}
    int     length()
                {return WList::length();}
    String_&     operator[](int i)
                {return (String_&) WList::operator[](i);}
};


// WListString_.cc:

#include "WListString_.h"
int      stepString_;
void     initializeWListString_(int step) {stepString_ = step;}


    To automatically produce these files, the EC++ preprocessor edits the
makefiles that perform the preprocessing and compilation steps in the
directory where the generic class is defined. The user should initially
provide only the basic pattern (which is part of the EC++ distibution) for the
makefiles in this directory.
    Processing a file that uses a parametric class, in case a new instance is
required, EC++ adds new targets to the makefiles, and instructions to produce
the new files via text substitution from the original generic class
definition.
   In order to make the process of getting an executable system coherent, it
is assumed, using EC++, that the make process is split into two steps: a
preprocess step that eventually generates some new instances and produces all
the necessary transformations on the original source files, and a final
compilation and linking step.


Support for exception handling.

   EC++ provides a mechanism for dealing with exceptions. The following are
considered exceptions:
 1) failed assertions;
 2) hardware or system generated signals;
 3) detection, in the user code, of an abnormal condition that cause an
explict call to the "Help!" facility.  
   The construct "Help!" is reserved and it causes execution of the code of
the active "rescue clause".
   In EC++ a rescue clause may be specified after the declaration and prior to
the body of a function (like assertions). It consists of a block of statements
that could be allowed at the beginning of the function body. 
   Let us suppose we have a certain number of functions for which a rescue
clause has been specifed. During program execution the active rescue clause is
the last seen in the stack. If at a certain point an exception is detected,
arising any of the exceptional conditions above, then control is transferred
to the code of the actual active rescue clause. This code may specify also a
program exit, if the situation is considered unrecoverable. If no tranfer
of control is specified, normally the routine body is reentered from the
beginning, and execution is retried. 
   The active rescue clause may not belong to the current executing function,
therefore an exception may involve unraveling the stack until the function
where the rescue is specified.  This may be dangerous, it involves a longjmp
and is accompanied by the problem of the destructors that will not be called.
The standard assumption of EC++, however, is that automatic garbage collection
is active, and this virtually eliminate the destructors problem. In any case,
very often the exception cannot be resolved locally in the function where it
is detected, so this feature is necessary in many situations.
   The following are two very simple examples of the admissable syntax and use
of the rescue clause and help invocation (they don't intend to suggest doing
things this way, but are just illustrative):


extern double inverse(double x);

main()
    rescue:3
    {}
{
    double x;
    cin>>x;
    cout<<"\n"<<inverse(x)<<"\n";
}


double inverse(double x)
    >>(x!=0)
{
    return 1/x;
}


   In this example an exception will be detected by the precondition assertion
of the function "inverse" if 0 is given as input. The effect will be to
unravel the function from the stack and execute the rescue clause of the main,
which in this case does nothing besides reinitiating the program and asking
for a new input value.
   The integer number after the colon following the rescue keyword is the
number of times that rescue is allowed. If the program doesn't come out with a
correct situation after retrying that number of times the program will exit
reporting a failure (i.e. an error condition).
   This is (approximately) the code that we get in output from the translator:


extern double inverse(double x);

#define CONTEXT__ "main()"

main()
{
    jmp_buf rescue_code;
    Rescue(&rescue_code, 3);
    if(setjmp(rescue_code))
    {}

    double x;
    cin>>x;
    cout<<"\n"<<inverse(x)<<"\n";
}


#define CONTEXT__ "double inverse(double x)"

double inverse(double x)
{
    PRECONDITION(x!=0)

    return 1/x;
}

   The class "Rescue" supports the required operations in conjunction with the
function "badNews":
 1) Pointers to jump locations of rescue instructions are maintained in a
stack list.
 2) A global variable "rescue__" points to the top of the stack and represent
the active rescue clause at any given moment.
 3) If a rescue clause is active (i.e. the Rescue stack is not empty) and an
exception occurs, the function badNews calls for a longjmp to the rescue code.
 4) The number of residual chances after a rescue is executed is updated.

   In the second example the rescue clause is local to the routine "inverse"
and is invoked explicitly by the "Help!" instruction.

main()
{
    double x;
    cin>>x;
    cout<<"\n"<<inverse(x)<<"\n";
}


double inverse(double x)
    rescue:1{x=1e-30;}
{
    if(x==0) Help!;
    return 1/x;
}

   An exception always involves a call to "badNews". It is worthwhile to point
out that, besides being raised by assertion checking and explicit "Help!"
invocation, an exception may be caused by an error detected by the hardware or
the operating system. This usually causes a signal to be generated, e.g.
floating point exceptions or segmentation violation signals.  The default
initialization phase of an EC++ application makes provisions to trap these
signals at any time and therefore also in these cases an exception is detected
and the function "badNews" is called.
    The method is relatively simple but with this basic framework more complex
behaviors, when needed, can also be simulated. Extension of this mechanism, in
order to make it more flexible, could be considered depending on practical
experience indications.


Support for garbage collection.

   EC++ may be used without garbage collection, but in the kind of application
it is intended for, as well as for avoiding some of the difficulties
encountered in exception handling and in writing rescue clauses, I suggest
using it, and writing code making the assumption that garbage collection is
active. The Boehm's garbage collector [4] is used and therefore the support
provided by EC++ is just a couple of files to interface the Boehm's package
with C++.
   It is worth noting that, using automatic garbage collection, a very
different style of programming can be adopted. All the complications of memory
management that we encounter when we need multiple reference to the same
object and that usually require manual reference counting suddenly disappear.
Also the difference in semantics of assignment and initialization, that took a
chapter in the Stroustrup's book [1], virtually disappears and we can use
pointer assignment without troubles.


Some notes on the environment.

   EC++ assumes work is in a directory tree with a root that is passed to the
EC++ preprocessor as a parameter. Files are searched in this tree. The
following assumptions are also either necessary or recommended:
 1) Original source files have suffixes ".cc" and ".h".
 2) Each directory that contains source files has two separate makefiles:
"preprocess.mk" and "compile.mk". The former applies the preprocessor to the
original source files and produces the corresponding processed files with
suffixes ".C" and ".H". These files contain #line directives to the original
files.
 3) To produce an executable file or a set of coherent object files, all the
preprocess makefiles are executed first in all directories, in order to have
all the include files preprocessed and updated, as well as all instances of
generic classes generated, then the compilations are performed.
   EC++ uses the "typeof" feature, that I think is not yet implemented in AT&T
2.0, but that is available in GNU g++. Therefore use of g++, or of other C++
compilers (if any) with equivalent features, is assumed.
   The current implementation of EC++ also uses classes of the GNU g++
library.  You should therefore have it available in order to compile and load
the files of the EC++ distribution.


POSSIBLE FUTURE DEVELOPMENTS.

   EC++ is currently being used and tested in this implementation. Some future
development (as we can foresee at this time) may include support for:
 1) debug instructions and assertion checking in any needed location;
 2) atomic allocation [4] to improve garbage collection performance;
 3) a more flexible exception handling mechanism;
 4) automatic production of documentation.


REFERENCES.

[1] - B. Stroustrup, "The C++ Programming Language", Addison-Wesley, 1986. 
[2] - B. Meyer, "Object-Oriented Software Construction", Prentice-Hall, 1988.
[3] - B. Stroustrup, "Possible Directions for C++", Proc. USENIX, Nov. 1987. 
[4] - H.J. Boehm, "Garbage Collection in an Uncooperative Environment",
      Software Practice and Experience, Vol. 18, pp. 807-820, Sept. 1988. 
[5] - D. Lea, "User's Guide to GNU C++ Library", May 1989.