Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!cs.utexas.edu!tut.cis.ohio-state.edu!gem.mps.ohio-state.edu!usc!masotti From: masotti@usc.edu (Glauco Masotti) Newsgroups: comp.lang.c++ Subject: EC++.TUTORIAL Message-ID: Date: 19 Oct 89 16:47:30 GMT Sender: news@usc.edu Distribution: comp.lang.c++, comp.object Organization: University of Southern California Lines: 682 A TUTORIAL INTRODUCTION TO EC++: EXTENDED C++ by Glauco Masotti(*) University of Southern California, Los Angeles September, 1989. (*) The Author is also with Ferrari Engineering, Modena, Italy. SUMMARY. EC++ stands for "Extended C++" and is an attempt to overcome some of the difficulties encountered working with C++, to make programming large systems somewhat more enjoyable. EC++ implements some extensions to the C++ language and provides an environment to support: - assertion checking, in the form of function preconditions and postconditions, and class invariants; - parameterized classes; - exception handling; - garbage collection. The extensions to the language are translated by a preprocessor (which is itself called EC++) into plain C++. The overall environment, is set up with an additional small number of files that define some required macros and classes that will be used by the applications. MOTIVATIONS: PROBLEMS AND WEAKNESSES OF C++. Assertions. Assertions are essentially logic statements that should be be verified at certains points in the code. Examples are invariants that must be satisfied by objects of a class at all times; preconditions that must be satified whenever a routine (function of any type) is called; postconditions that must guaranteed to be true after routine completion. A good presentation of the use of assertions can be found in [2]. No support is provided in C++ to express assertions in a disciplined way, and no emphasis has been put on them by other languages, except Eiffel. However, judging on my past experience with large software projects, and the present experience with EC++, assertions are very useful to write correct and robust software. Assertions have to be considered as an integral part of the source code. First of all, they play the role of documentation of the contract that controls the relationship between a function and its users: the precondition binds the users, the postcondition binds the function. This make clear what are the assumed working conditions for a function and whose responsibility it is to provide or ascertain required conditions. Making software coherent with the specified assertions therefore will require the necessary checks, but also eliminates wildly diffuse checking tests thoughout the code, which is what happens when the contract is not specified. Second, they enable us to express formal properties of classes and routines. This use of assertions may help particularly in the high-level design phase of a system, in combination with the use of abstract or virtual functions. Details of implementation will be provided only later, but requirements and properties of fundamental classes and relevant effects of functions can be defined, through assertions, at earlier stages of design. Third, by monitoring assertions at run-time we can verify the correctness of our code. Many programming errors are pointed out immediately by violations of some assertion. Therefore we can save considerable time in testing and debugging. Fourth, assertions are a means for guaranteeing coherence among different modules of a large program. We can use them to ensure non contradictory results of different functions and proper working conditions for each module. Libraries, reusable modules and parameterized types. While it is relatively easy in C++ to write functions of classes that depend on a single underlying type, e.g. the case of class Triangle and class Rectangle that inherit from a class Polygon, it is, however difficult to write functions that depend only on the general concepts of a container (sets, vectors, lists, tables, etc.) rather than on properties of individual containers [3]. This is the basic weakness that prevents writing more general reusable libraries and modules in C++. If polymorphic parameterized classes were available, these classes could be written just once, and instantiated to the actual needed types to provide code sharing and reusability. C++ may fake parameterized types using macros ([1] par. 7.3.5), but this approach doesn't work well except on a small scale. In fact lines of thousands of characters are generated, by the standard C++ preprocessor, even for simple classes. Moreover use of macros is cumbersome in discovering compile errors and in debugging. Another approach suggested for use of the GNU C++ Library [5], still places some burden on the programmer who wants to use generic classes. It also fails to reuse code, as code is duplicated, instead of resorting to the technique of using generic interfaces to a common implementation as suggested in [1]. Exception handling. During program execution a function may encounter a situation where an error occurs or something strange happens. If the problem is detected, some attempt to recover a manageable situation may be done, or otherwise a failure should be reported. If the problem is not detected or ignored the program may either crash or report incorrect results. In the most lucky cases, things may be patched up locally, but in many cases the routine that detects the problem cannot cope with the abnormal situation locally; an exception detected in one context may require giving up the current activity and performing some action in another context. A number of ad hoc techniques have been until now devised to deal with exceptional conditions: use of status variables, returning error values back along the chain of function calls, use of longjmp to reach a context where things can be recovered directly. This may involve restoring certain conditions and the problem is often complicated by memory management. In particular in C++ we have the problem of calling the destructors for all created and no more needed objects [3]. In any case coping with a particular kind of exception is tedious but feasible, coping with all possible exceptions leads to an explosion in code size and complexity, and is itself an error-prone activity, that is moreover difficult to test and debug, for the intrinsic reason that exceptions are usually unlikely situations. Not coping with exceptions (as is too often done) leads to programs that crash. Most of the popular languages however, as well as C++, don't offer any exception handling mechanism or disciplined procedure. Automatic storage management. C++ inherits its memory management scheme from C. Storage for static and automatic objects is managed automatically, but object allocated in the free store must be explicitly deallocated, before the space can be reused. Manual memory management usually provides efficiency in time and space but is a burden for the programmer since it increases considerably the complexity of code and is a common source of tricky bugs. Automatic garbage collection has not been put as part of the language in the belief that it is too much expensive. Some schemes indeed, need to maintain additional information in order to do garbage collection. Therefore they place an overhead, both in time and space, in an executing program; the representation of an object in memory takes more space, and accessing the objects takes more time than in a language without automatic garbage collection. However as Boehm has demonstrated in his work [4], there are also conservative schemes that don't need any particular assumption, and that don't pose serious efficiency problems. The most pertinent observation here is that as long as we can tolerate virtual memory, we should tolerate automatic garbage collection as well. Run-time and memory utilization however are not the only considerations to have in mind. If done at a reasonable cost, automatic garbage collection can yield substantial benefits in design and development time, as well in debugging and maintenance effort. It can also be a necessary key feature, in the design of general reusable modules, and in simplifying the exception handling problem. In fact in the former case without a system taking care of memory management automatically it would not be possible to put responsibility for it in either the general modules or their users, and in the latter case it would not be possible to call all the necessary destructors for the objects created to cope with an abnormal situation, as seen before. However if freeing memory is all that destructors have to do (as is often the case), with automatic garbage collection we don't need to care about the problem. THE EC++ SOLUTION. Assertions. In EC++ we can express assertions in the form of class invariants and function pre and post conditions. A class invariant is defined like any other member function, but it uses the name "invariant", which is a reserved word in EC++. Function pre and post conditions must be specified after the declaration of a function and prior to its body. A function precondition is prefixed by >> (reminiscent of an input condition) and the conditional expression is enclosed in parentheses. Any valid conditional expression that may be written at the beginning of the body of the routine is allowed. A function postcondition is prefixed by << (reminiscent of an output condition) and enclosed in parentheses. Valid expressions are those that may be expressed prior to every possible return path from the function. Besides using the variables that are accessible in the scope of the function, an OLD specification may be used, to mean the value that a certain expression had upon entering the routine. We will refer to some examples describing some of the member functions of the following (simplified) class: typedef void* word; class WList { /* A general container with list-like capabilities, implemented as a dynamic array of elements of the same type. It can contain elements that fit in a word. */ friend class WList_iterator; public: WList(); void append(word, int); void insert(int, word, int); ... void remove(int, int); int length(); word& operator[](int); int invariant() {return len>=0 && len<=size);} protected: short int len; short int size; word* root; /* pointer to a vector of words in the free-store */ private: word* resize(int); void grow_or_create_maybe(int); void shrink_or_delete_maybe(int); void shift_forward_from(int); void shift_backward_after(int); }; Memory is allocated in chunks of fixed size, when needed. It is worth noting at this point the invariant of the class, that asserts the fundamental property that relates the length of an object (i.e. the number of elements stored) and its size (the allocated storage). If an "invariant" function is not defined for a class, and invariant checking is activated, a global "invariant" function (returning always 1) will be called. The following examples should explain the admissable syntax for function pre and post conditions: word* WList::resize(int newsize) >>(root != 0 && size > 0 && newsize > 0) <<(newptr != 0 && *newptr == OLD(*root)) { word* newptr; newptr = new word[newsize]; word* p; word* q; int min = (size>( (root != 0 && size > 0) || (root == 0 && size == 0) ) <<( root != 0 && size >= OLD(size) ) { if( root == 0 ) { /* Create! */ root = new word[stp]; size=stp; } else if( len+1 > size ) { /* Space too small => resize */ root = resize(size+stp); size+=stp; } len++; } Given the previous definition of the function "resize" the preprocessor produces C++ code, that (omitting the #line directives and other details) looks like this: #define CONTEXT__ "word* WList::resize(int newsize)" word* WList::resize(int newsize) { PRECONDITION(root != 0 && size > 0 && newsize > 0) DECLARE_OLDS( \ typeof(*root) old__root = *root; \ ) word* newptr; newptr = new word[newsize]; word* p; word* q; int min = (size as follows: #typedef #include "containers/WList.h" extern int step; extern void initializeWList(int); struct WList : WList { /* A generic parameterized container with list-like capabilities, implemented as a dynamic array of elements of type . Elements of type must fit in a word. */ friend class WList_iterator; WList() : WList() {} void append( w) {WList::append((word)w, step);} void insert(int i, w) {WList::insert(i, (word)w, step);} ... ... void remove(int i) {WList::remove(i, step);} int length() {return WList::length();} & operator[](int i) {return (&) WList::operator[](i);} }; Note the "#typedef" keyword, that will be used later in the instantiation of this parametric definition, and the use of generic identifiers, depending on the parameter . This generic class may be used as follows: #include #include "generics/WList.h" struct Sentence : WList { Sentence(); void input_sentence(); void print_sentence(); ... private: void print_current(String*); }; Sentence::Sentence() : WList() {}; void Sentence::input_sentence() { String* s; do { ... append(s); } while(*s != "."); remove(len-1); } Parametric classes may therefore be instantiated to a particular type and parametric identifiers may be used. An arbitrary type (given certain constraints) may be used and placed between "<...>" in all the places where a match with the corresponding generic parameter in the generic class is possible. In order to get this working no manual operation is necessary, the preprocessor of EC++ will take care of all that is needed: 1) First of all the conventional expressions of a parametric identifier must be expanded in the source files that use parametric classes, in order to obtain valid C++ identifiers for the parametric names. In our case the source files are transformed this way: #include #include "generics/WListString_.H" struct Sentence : WListString_ { Sentence(); void input_sentence(); void print_sentence(); ... private: void print_current(String*); }; Sentence::Sentence() : WListString_() {}; void Sentence::input_sentence() { String* s; do { ... append(s); } while(*s != "."); remove(len-1); } The substring "String_" take the place of "" in the parametric names, and is concatenated with the rest of the name. As is shown in the example, besides simple types, pointer types may constitute valid parameter types for EC++. In the actual implementation more complex parameter type will require a typedef statement, but these should be quite uncommon cases. 2) We have to generate the files that define the required instance of the generic class. In the previous example these files are "generics/WListString_.h" and "generics/WListString_.cc" (the suffixes .H and .C will then placed by the final preprocessing step). In our example the code of the instantiated interface class, generated by the preprocessor, is: // WListString_.h: typedef String* String_; #include "containers/WList.h" extern int stepString_; extern void initializeWListString_(int); struct WListString_ : WList { /* A container with list-like capabilities, implemented as a dynamic array of elements of type String_. Type String_ must fit in a word. */ friend class WList_iteratorString_; WListString_() : WList() {} void append(String_ w) {WList::append((word)w, stepString_);} void insert(int i, String_ w) {WList::insert(i, (word)w, stepString_);} ... ... void remove(int i) {WList::remove(i, stepString_);} int length() {return WList::length();} String_& operator[](int i) {return (String_&) WList::operator[](i);} }; // WListString_.cc: #include "WListString_.h" int stepString_; void initializeWListString_(int step) {stepString_ = step;} To automatically produce these files, the EC++ preprocessor edits the makefiles that perform the preprocessing and compilation steps in the directory where the generic class is defined. The user should initially provide only the basic pattern (which is part of the EC++ distibution) for the makefiles in this directory. Processing a file that uses a parametric class, in case a new instance is required, EC++ adds new targets to the makefiles, and instructions to produce the new files via text substitution from the original generic class definition. In order to make the process of getting an executable system coherent, it is assumed, using EC++, that the make process is split into two steps: a preprocess step that eventually generates some new instances and produces all the necessary transformations on the original source files, and a final compilation and linking step. Support for exception handling. EC++ provides a mechanism for dealing with exceptions. The following are considered exceptions: 1) failed assertions; 2) hardware or system generated signals; 3) detection, in the user code, of an abnormal condition that cause an explict call to the "Help!" facility. The construct "Help!" is reserved and it causes execution of the code of the active "rescue clause". In EC++ a rescue clause may be specified after the declaration and prior to the body of a function (like assertions). It consists of a block of statements that could be allowed at the beginning of the function body. Let us suppose we have a certain number of functions for which a rescue clause has been specifed. During program execution the active rescue clause is the last seen in the stack. If at a certain point an exception is detected, arising any of the exceptional conditions above, then control is transferred to the code of the actual active rescue clause. This code may specify also a program exit, if the situation is considered unrecoverable. If no tranfer of control is specified, normally the routine body is reentered from the beginning, and execution is retried. The active rescue clause may not belong to the current executing function, therefore an exception may involve unraveling the stack until the function where the rescue is specified. This may be dangerous, it involves a longjmp and is accompanied by the problem of the destructors that will not be called. The standard assumption of EC++, however, is that automatic garbage collection is active, and this virtually eliminate the destructors problem. In any case, very often the exception cannot be resolved locally in the function where it is detected, so this feature is necessary in many situations. The following are two very simple examples of the admissable syntax and use of the rescue clause and help invocation (they don't intend to suggest doing things this way, but are just illustrative): extern double inverse(double x); main() rescue:3 {} { double x; cin>>x; cout<<"\n"<>(x!=0) { return 1/x; } In this example an exception will be detected by the precondition assertion of the function "inverse" if 0 is given as input. The effect will be to unravel the function from the stack and execute the rescue clause of the main, which in this case does nothing besides reinitiating the program and asking for a new input value. The integer number after the colon following the rescue keyword is the number of times that rescue is allowed. If the program doesn't come out with a correct situation after retrying that number of times the program will exit reporting a failure (i.e. an error condition). This is (approximately) the code that we get in output from the translator: extern double inverse(double x); #define CONTEXT__ "main()" main() { jmp_buf rescue_code; Rescue(&rescue_code, 3); if(setjmp(rescue_code)) {} double x; cin>>x; cout<<"\n"<>x; cout<<"\n"<