Xref: utzoo comp.lang.c++:11161 comp.std.c++:534 Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!spool2.mu.edu!sdd.hp.com!elroy.jpl.nasa.gov!decwrl!infopiz!lupine!lupine.ncd.com From: rfg@lupine.ncd.com (Ron Guilmette) Newsgroups: comp.lang.c++,comp.std.c++ Subject: Smart pointers and Stupid people (my reactions and a new idea) Message-ID: <3348@lupine.NCD.COM> Date: 13 Jan 91 03:38:58 GMT Sender: rfg@NCD.COM Followup-To: comp.lang.c++ Organization: Network Computing Devices, Inc., Mt. View, CA Lines: 264 Now for my detailed comments on the "smart pointer problem" discussion so far. (This is where it really starts to get biased! :-) -------------------------------------------------------------------------- I believe that the concerns expressed by Andrew Ginter are, for the most part, non-issues. I don't see where functions which return pointers (either smart or stupid ones) need to cause us any special concerns. Likewise for temporaries and expression evaluation ordering. The `this' pointer is worthy of note only in that it must have type T* and thus, the type T* should be unrestricted wherever `this' is accessible. Taking the address of a data member of an object of type T need not cause us any special concer either because the value yielded by this use of the unary & operator will be of some pointer-to-member-of-T type, which cannot be subsequently be used in isolation. Ratherany such pointer-to-member- of-T may only be used in conjunction with honest-to-goodness pointers to objects of the type T and if these rae maintained correctly than all will work out just fine. Likewise, Tim Atkins concerns are (I believe) misplaced. I don't think that it is necessary to have *all* pointers to some type T be smart in order for a program containing objects of type T to be useful. Quite the contrary, it seems to me that for any pointed-at type (T) you may want to use smart pointers (to T) in most places and you will absolutely have to use stupid pointers to T in certain (limited) places. Additionally, I don't see where low level implementation-specific details (e.g. the code that cfront generates) needs to enter into this discussion unless cfront has bugs that become aparent when we are fiddling with smart (or stupid) pointers. The issue of T*'s in registers also seems unrelated, unless of course our garbage collector can be triggered into action asynchronously (e.g. as the result of a signal). In that case, it may be wise to declare all of our stupid pointers to be volatile (so that we don't get into memory/register synchronization problems) but that is all unrelated to the point of this discussion. I believe that both Peter Grandi and Jim Adcock are saying that we need to restrict the use of the type T* at run-time via run-time mechanisms. If so, I disagree with both of them. I feel that we ought to be able to do something at compile-time where the performance cost is not so high as it is for things done at run-time. Henry Cobb's idea to make all constructors for type T private is somewhat similar to Marshall Cline's suggestion of nesting the declaration (and definition) of the class T within a smart_pointer_to_T class. In both cases, the idea seems to be to restrict the ability to create objects of type T to some particular (limited) set of lexical scopes (all of which are under complete control of the smart_pointer_to_T type). To varying degrees, these two proposals solve the "smart pointer problem" by making the type T unknown to the outside world. (In the case of Henry's proposal, the whole program could at least say `sizeof(T)' whereas in Marshall's proposal, even that would be illegal outside of the encapsulating outer class.) Anyway, these two proposals succeed by hiding the type T from those who would attempt to use it directly, and by forcing such potential users to ask for assistance from the smart_pointer_to_T type in order to do anything (including creation and destruction) with an object of type T. These solutions have definite merits, but there is a downside to hiding the type T. (More on this later.) The solution proposed by Bob Martin and (independently) also by Jeremy Grodberg to allow the type T* to be treated like a class (which can be declared and which can have member functions and operators defined for it) is clever and I had myself considered it, however I fear that Bjarne will never like it. The reason? Well, it makes the language "mutable" (in Stroustrup's terms). One early (and related) idea which I had some time ago for solving this "smart pointer" problem was to allow stuff like: T*& operator= (T*, T*&); T operator* (T*); In effect, I wanted to let the user just redefine the meaning of = and (unary) * for plain old pointer types. If you could do that, then you could be in complete control of all operations done with stupid pointers. That idea was almost the same as allowing: class T* { public: T*& operator= (T*&); T operator* (); }; But in both cases, you are allowing the user to change the existing meaning of things whose meaning is already well defined in the language (e.g. the meaning of unary * when applied to a pointer type value). Bjarne doesn't want to open that Pandora's box. I tend to feel that this one important case (of pointer types) might warrant a bit of "mutability" being allowed to stick its nose into the tent, but it doesn't much matter what I think. I doubt that Bjarne will have any part of it. Of all of these ideas, I think that I like Marshall Cline's the best. It certainly has good prospects of being implemented widely so that we can all start to use it soon. After all, it relies only on features of C++ which are already described in current drafts of the x3j16 working documents! In effect, nested classes are already "in" the standard. (I hope nobody in x3j16 kills me for having said that.) Likewise, Henry Cobb's idea (to make all constructors for T private and then to just make functions and classes which actually have to create T's into friends of T) is a good solution which ought to work even with current implementations. I do see some problems with these two ideas however. First and foremost, by using either of these approaches, I have to give up the ability (which I would otherwise have) to simply declare an object of type T as a storage-class `static' file-scope variable, or as storage-class `auto' variable (local to a function) or even as a member. I don't like that one bit! Just because I want the use of T*'s to be to be restricted does not mean that I also want to be restricted in what I can do with a T. Gosh darn it! I want my cake and I want to eat it too! Another problem with both Marshall's idea and with Henry's idea is that they both require me to put the entire *definition* of the (controlled) class T into header files where I don't even want it to be! That slows down compilation unnecessarily (which irks me). For example, with Henry's proposal, I have to put this into my header file: smart_tp.h: --------------------------------------------------------------- class T { /* ... the complete definition of T ...*/ friend class smart_pointer_to_T; }; class smart_pointer_to_T { /* ... definition of smart_pointer_to_T ... */ }; --------------------------------------------------------------- Here, both definitions of both classes have to be scanned and compiled for each .C file which includes "smart_tp.h". Many of these may not even need to know *any* of the details of the definition of class T. Likewise, for Marshall's proposal, I need: smart_tp.h: --------------------------------------------------------------- class smart_pointer_to_T { class T { /* ... the complete definition of T ...*/ }; /* ... definition of smart_pointer_to_T ... */ }; --------------------------------------------------------------- Which is equally wasteful of compile time. Now somebody else was asking over in comp.std.c++ if it was legal to incompletely declare a nested class, so that you could have (for example): smart_tp.h: --------------------------------------------------------------- class smart_pointer_to_T { class T; /* incomplete declaration of T */ /* ... definition of smart_pointer_to_T ... */ }; --------------------------------------------------------------- and then later on in a different file: complete_t.C: --------------------------------------------------------------- #include "smart_tp.h" class smart_pointer_to_T::T { /* completion of type smart_pointer_to_T::T */ }; --------------------------------------------------------------- In my opinion, that would be "way cool" if you could do that, but I don't think that it is legal. Furthermore, even if it is legal, it only provides a way of eliminating one of my two objections to Marshall's proposed solution to the "smart pointer problem". The other (more important) objection still remains. You still couldn't declare T objects all over the place. You could only created them where the smart_pointer_to_T type would let you (probably only in the heap). My initial proposal was intended solve the "smart pointer problem" while keeping the language "immutable", allowing declarations of T objects in most places, and avoiding any need to have a complete definition of the type T preceed the definition of the type smart_pointer_to_T. I believe that my proposal did all that, but I'm now starting to wonder if it was really such a hot idea after all. My proposal simply provided a means for telling the compiler that (in certain contexts) it sould treat uses of type T* values as illegal (thus forcing the user to use the smart pointer type in those contexts instead). Perhaps I grabbed the problem by the wrong end. I now believe that it might be equally effective to simply make it impossible to even generate a valid (non-null) stupid pointer-to-T value in certain contexts. Obviously, if you can prevent valid values of type T* from leaking out into some area then you don't even need to worry about whether or not operations on T*'s are restricted (over that area) or not. Obviously, for a class type T, you can overload operator& (either as a member function of T or as a global function taking a T&). That right there puts you in control of most of the cases where a T* could potentially be generated. Unfortunately, there are others that you (currently) can't control. As Jeremy Grodberg (jgro@lia.com) noted, the language rules currently say that if you new() an array of objects of some type T, the global operator new is invoked for this regardless of whether or not the type T has its own class-specific operator new() defined. As a result, whenever you new() an array of T, you'll get back a value of type T* even if you would have preferd getting back a value of type smart_pointer_to_T. This is a one means by which which unwanted (but valid and non-void) values of type T* may leak into some context. This leakage is very bad and it ought to be rectified by x3j16. Also, there is one more leakage problem. Given some local or global variable called `ta' of type array of T, the following expression yields a value of type T* even if the type T has its own class-specific operator& defined for it: ta That's it! The name of an array is generally converted (implicitly) into a pointer to the zeroth member of that array. This implicit conversion currently circumvents any class-specific operator& definition (if one is present) for the class T and allows values of type T* to leak into contexts where they may not be welcomed. If both of these unfortunate leaks in the language could be plugged, we might be able to achieve really water-tight "safe" smart pointer types just by overloading operator& (and having it yield a smart pointer type) for some "controlled" type T. Both leaks could be easily plugged while doing little harm to the existing language. For the first leak, it would be easy enough to say that a class-specific operator new() for a class T is called whenever a single object *or* an array of objects of type T is new'ed. Such operators could then be defined by the user to return some smart pointer type. For the second leak, we could simply redefine the semantics of "array-name" (where "array-name" names an array of objects of some class type) to be equivalent to invoking (implicitly) an applicable operator& (either member or global) on the zeroth element of the array. There now. That was simple, eh? Note that by plugging these two leaks, we have not destroyed the user's ability to declare objects of type T (or even objects of type T*) but what we have done is to give the user all of the tools he/she needs in order to insure that no useful values of type T* (other than NULL) ever leak into a given area (where they might be misused).