Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!tut.cis.ohio-state.edu!gem.mps.ohio-state.edu!sunybcs!oswego!news
From: dl@g.g.oswego.edu (Doug Lea)
Newsgroups: comp.lang.c++
Subject: Strings
Message-ID:
Date: 24 Oct 89 11:51:33 GMT
Sender: news@oswego.Oswego.EDU (Network News)
Reply-To: dl@oswego.edu
Distribution: comp
Organization: SUNY Oswego
Lines: 211
Two issues involving Strings...
Andy Koenig says...
> > a+b = c;
>
> > appears to be legal. (As least it compiled under 1.2.) Is it legal under
> > 2.0 ? What does it really mean? Shouldn't the '=' operator be forced
> > to only accept an lvalue as its left-hand operand?
>
> How about making operator+ return a const matrix?
> Then you won't be able to assign to it.
>
>
> To tell the truth, I hadn't thought about this issue until this
> question forced me to do so. There are zillions of things like
> string classes out there that say
>
> extern String operator+(const String&, const String&);
>
> and apparently they really should say
>
> extern const String operator+(const string&, const String&);
This doesn't seem like the right solution. Consider
String& addeol(String& s) { s += "\n"; return s; }
main()
{
String a, b; //...
String c = addeol(a+b);
//...
}
which would be illegal if operator+ returned a const String. (Yes,
the form of `addeol' is contrived, but not indefensible.)
Actually, I think the `a+b = c;' issue is more of a curiosity -- an
inherent difference between classes and builtins -- than a real
problem. (There are a couple of other class vs builtin differences
along these lines that I briefly mentioned in my Denver Usenix paper.)
The code is legal and compiles (at least with libg++ Strings), but
results in a temporary being created for (a+b), then modified via the
assignment (=c), but never bound to any symbol, so inaccessible.
While this looks odd, it does do exactly what the programmer
specified.
In an unrelated thread, Jerry Schwarz says...
> Indeed, more than discussed. This is essentially the method
> used by the AT&T 1.2 stream package. There are several
> problems with it. Where does the space come from for the string?
> How about all the twiddles on formatting available in stdio?
> (e.g. the case of the alphabetic "digits" in a hex number)
>
> But you don't have to choose. Its fairly easy to implement
> the functionality of the above without intermediate strings.
>
> One (among several choices) is
>
> class decimalString() {
> public:
> decimalString(int v, int w) : value(v), width(w) { }
> int value ;
> int width ;
> } ;
>
> ostream& operator<< (ostream& o,decimalString& s)
> {
> int f = o.flags();
> o << dec << setw(s.w) << s.value ;
> o.setf(p,ios::basefield);
> return o ;
> }
>
> There is a philosopical point here. In C the builtin types are
> special. Its perfectly reasonable to have a C I/O library that
> has a lot of formatting stuff for them. In C++ user defined classes
> are just as important as the builtin types. What is important is
> not that there be a lot of formatting stuff for the builtin types,
> but that there be a mechanism for extending the I/O. In C++ it is
> usually much better to determine styles of printing, widths and
> the like based on the role (type) type of the data rather than
> specifying it at each individual I/O statement.
>
> In hindsight I think I put too much special stuff in the
> iostream library for the builtin types. Historically, what
> happened was that the builtin type stuff was done first, and
> only much later did I develop the extensibility features
> (such as xalloc).
I see the basic problem here just a little differently. The most
primitive stream output routine for printing strings might go
something like:
ostream& ostream::put(const char* p)
{
while (*p != 0) put(*p++);
return *this;
}
This can be problematic if you'd like to have ostream << int do
something like
char* dec(int i);
ostream& operator << (ostream& s, int i) { return put(dec(i)); };
since, as Jerry notes, you then have to decide how to allocate the
space for the results of dec(). To make dec() reasonably general, you
can't just use a fixed static buffer, or else
cout << dec(10) << dec(20);
would not work right if, for example, the compiler uses right-to-left
evaluation (which is legal).
So instead, you might want to get around this by employing your
off-the-shelf String class:
class String
{
char* s;
public:
operator const char* () { return s; }
//... lots of other stuff
};
and redo dec() as
String dec(int i);
but now, something even more unfortunate can happen in
ostream& operator << (ostream& s, int i) { return put(dec(i)); }
Since dec() returns a String, but put() wants a char*, the String
operator const char* () conversion is made. However, this too
can fail! The reason has to do with C++ lifetime rules for
temporaries: The temp String returned by dec is `used up' by
the char* conversion, so the compiler is allowed to kill it off
*before* entering put(). But the `conversion' really just returns
a pointer into the String, so if the String is killed off, the pointer
is invalid, and things are broken again. In other words, the char*
conversion operator cannot just return a pointer, it must allocate
some space, and copy the String representation. But where? Back to
square one.
Here are some solutions:
1) Make an ostream << String operator, and use it exclusively instead
of char*'s, from the ground up, in ostreams. This is the right
solution in many senses, but is problematic in that it presupposes
that there is a single, best String class out there suitable for all
needs. But there are many good String classes around. Standardizing on
a particular version to serve as the basis for the de facto standard
stream library seems premature.
2) Change the C++ rules about lifetimes for temporaries, so that they,
like `normal' variables have lifetimes to the end of the enclosing
scope. This solution has merit on other grounds as well, but also
creates some of its own difficulties. Actually, this may be going too
far. The lifetime rules for temporaries say that if a *reference* to
a temp (or any part thereof?) is taken (or any ref-returning member
function is called?), then its lifetime *is* to the end of the enclosing
scope. The char* conversion *behaves* like a reference, but is not
one. I once proposed that C++ allow the idiom of a char[]& to mean a
reference to a character array. Support of this would solve this (and
other) problems, since one could create a
char[]& String::chars() { return s /* or whatever */ ; },
call it inside the ostream << int via `return put(dec(i).chars())',
and everything would work just right. But no one has ever told me
that they particularly like this idea.
3) Have dec() and friends return freestore allocated space, and
require that programmers manually delete them. Most users wouldn't
like this very much.
4) Use a garbage collection scheme for formatting strings, and/or
Strings in general. This seems to be overkill for the problem
at hand. Strings themselves are very-well behaved lifetime-wise, it's
the char* conversions that raise problems.
5) Create a simple approximation to garbage collection. Set up a pool
of space to be used for miscellaneous conversions, and use it for
dec(), oct(), and so on. Guarantee that the most recent N (some FIXED
number, say, 100) formatting strings will be on hand at any given
time. The pool manager can then reuse the space for old formatting
strings when needed. Both AT&T 1.2, and libg++-1.36.0 use some
variation of this approach. The String const char*() operator may also
copy into this pool. The major drawback is that if programmers
contrive expressions that requires more than N live formatting
strings, then they are out of luck.
6) Avoid reliance on generic conversion functions like dec(), and
build special conversion buffers, etc., into the stream classes. AT&T
2.0 streams appear to do something along these lines. As Jerry says,
this puts too much smarts in the stream classes, but is entirely safe.
Unfortunately, it is also not as easily extensible as one might like.
It is awkward (although not impossible) to use this scheme to output,
say, arbitrary-precision Integers or other types in which the user
class, not the stream class knows how to set things up for formatting.
It also limits generality a bit. Formatting strings are sometimes
needed for other purposes than ostream output.
--
Doug Lea, Computer Science Dept., SUNY Oswego, Oswego, NY, 13126 (315)341-2367
email: dl@oswego.edu or dl%oswego.edu@nisc.nyser.net
UUCP :...cornell!devvax!oswego!dl or ...rutgers!sunybcs!oswego!dl