Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!tut.cis.ohio-state.edu!gem.mps.ohio-state.edu!sunybcs!oswego!news From: dl@g.g.oswego.edu (Doug Lea) Newsgroups: comp.lang.c++ Subject: Strings Message-ID: Date: 24 Oct 89 11:51:33 GMT Sender: news@oswego.Oswego.EDU (Network News) Reply-To: dl@oswego.edu Distribution: comp Organization: SUNY Oswego Lines: 211 Two issues involving Strings... Andy Koenig says... > > a+b = c; > > > appears to be legal. (As least it compiled under 1.2.) Is it legal under > > 2.0 ? What does it really mean? Shouldn't the '=' operator be forced > > to only accept an lvalue as its left-hand operand? > > How about making operator+ return a const matrix? > Then you won't be able to assign to it. > > > To tell the truth, I hadn't thought about this issue until this > question forced me to do so. There are zillions of things like > string classes out there that say > > extern String operator+(const String&, const String&); > > and apparently they really should say > > extern const String operator+(const string&, const String&); This doesn't seem like the right solution. Consider String& addeol(String& s) { s += "\n"; return s; } main() { String a, b; //... String c = addeol(a+b); //... } which would be illegal if operator+ returned a const String. (Yes, the form of `addeol' is contrived, but not indefensible.) Actually, I think the `a+b = c;' issue is more of a curiosity -- an inherent difference between classes and builtins -- than a real problem. (There are a couple of other class vs builtin differences along these lines that I briefly mentioned in my Denver Usenix paper.) The code is legal and compiles (at least with libg++ Strings), but results in a temporary being created for (a+b), then modified via the assignment (=c), but never bound to any symbol, so inaccessible. While this looks odd, it does do exactly what the programmer specified. In an unrelated thread, Jerry Schwarz says... > Indeed, more than discussed. This is essentially the method > used by the AT&T 1.2 stream package. There are several > problems with it. Where does the space come from for the string? > How about all the twiddles on formatting available in stdio? > (e.g. the case of the alphabetic "digits" in a hex number) > > But you don't have to choose. Its fairly easy to implement > the functionality of the above without intermediate strings. > > One (among several choices) is > > class decimalString() { > public: > decimalString(int v, int w) : value(v), width(w) { } > int value ; > int width ; > } ; > > ostream& operator<< (ostream& o,decimalString& s) > { > int f = o.flags(); > o << dec << setw(s.w) << s.value ; > o.setf(p,ios::basefield); > return o ; > } > > There is a philosopical point here. In C the builtin types are > special. Its perfectly reasonable to have a C I/O library that > has a lot of formatting stuff for them. In C++ user defined classes > are just as important as the builtin types. What is important is > not that there be a lot of formatting stuff for the builtin types, > but that there be a mechanism for extending the I/O. In C++ it is > usually much better to determine styles of printing, widths and > the like based on the role (type) type of the data rather than > specifying it at each individual I/O statement. > > In hindsight I think I put too much special stuff in the > iostream library for the builtin types. Historically, what > happened was that the builtin type stuff was done first, and > only much later did I develop the extensibility features > (such as xalloc). I see the basic problem here just a little differently. The most primitive stream output routine for printing strings might go something like: ostream& ostream::put(const char* p) { while (*p != 0) put(*p++); return *this; } This can be problematic if you'd like to have ostream << int do something like char* dec(int i); ostream& operator << (ostream& s, int i) { return put(dec(i)); }; since, as Jerry notes, you then have to decide how to allocate the space for the results of dec(). To make dec() reasonably general, you can't just use a fixed static buffer, or else cout << dec(10) << dec(20); would not work right if, for example, the compiler uses right-to-left evaluation (which is legal). So instead, you might want to get around this by employing your off-the-shelf String class: class String { char* s; public: operator const char* () { return s; } //... lots of other stuff }; and redo dec() as String dec(int i); but now, something even more unfortunate can happen in ostream& operator << (ostream& s, int i) { return put(dec(i)); } Since dec() returns a String, but put() wants a char*, the String operator const char* () conversion is made. However, this too can fail! The reason has to do with C++ lifetime rules for temporaries: The temp String returned by dec is `used up' by the char* conversion, so the compiler is allowed to kill it off *before* entering put(). But the `conversion' really just returns a pointer into the String, so if the String is killed off, the pointer is invalid, and things are broken again. In other words, the char* conversion operator cannot just return a pointer, it must allocate some space, and copy the String representation. But where? Back to square one. Here are some solutions: 1) Make an ostream << String operator, and use it exclusively instead of char*'s, from the ground up, in ostreams. This is the right solution in many senses, but is problematic in that it presupposes that there is a single, best String class out there suitable for all needs. But there are many good String classes around. Standardizing on a particular version to serve as the basis for the de facto standard stream library seems premature. 2) Change the C++ rules about lifetimes for temporaries, so that they, like `normal' variables have lifetimes to the end of the enclosing scope. This solution has merit on other grounds as well, but also creates some of its own difficulties. Actually, this may be going too far. The lifetime rules for temporaries say that if a *reference* to a temp (or any part thereof?) is taken (or any ref-returning member function is called?), then its lifetime *is* to the end of the enclosing scope. The char* conversion *behaves* like a reference, but is not one. I once proposed that C++ allow the idiom of a char[]& to mean a reference to a character array. Support of this would solve this (and other) problems, since one could create a char[]& String::chars() { return s /* or whatever */ ; }, call it inside the ostream << int via `return put(dec(i).chars())', and everything would work just right. But no one has ever told me that they particularly like this idea. 3) Have dec() and friends return freestore allocated space, and require that programmers manually delete them. Most users wouldn't like this very much. 4) Use a garbage collection scheme for formatting strings, and/or Strings in general. This seems to be overkill for the problem at hand. Strings themselves are very-well behaved lifetime-wise, it's the char* conversions that raise problems. 5) Create a simple approximation to garbage collection. Set up a pool of space to be used for miscellaneous conversions, and use it for dec(), oct(), and so on. Guarantee that the most recent N (some FIXED number, say, 100) formatting strings will be on hand at any given time. The pool manager can then reuse the space for old formatting strings when needed. Both AT&T 1.2, and libg++-1.36.0 use some variation of this approach. The String const char*() operator may also copy into this pool. The major drawback is that if programmers contrive expressions that requires more than N live formatting strings, then they are out of luck. 6) Avoid reliance on generic conversion functions like dec(), and build special conversion buffers, etc., into the stream classes. AT&T 2.0 streams appear to do something along these lines. As Jerry says, this puts too much smarts in the stream classes, but is entirely safe. Unfortunately, it is also not as easily extensible as one might like. It is awkward (although not impossible) to use this scheme to output, say, arbitrary-precision Integers or other types in which the user class, not the stream class knows how to set things up for formatting. It also limits generality a bit. Formatting strings are sometimes needed for other purposes than ostream output. -- Doug Lea, Computer Science Dept., SUNY Oswego, Oswego, NY, 13126 (315)341-2367 email: dl@oswego.edu or dl%oswego.edu@nisc.nyser.net UUCP :...cornell!devvax!oswego!dl or ...rutgers!sunybcs!oswego!dl