Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!nrl-cmf!ukma!tut.cis.ohio-state.edu!rutgers!att!ulysses!andante!alice!dmr From: dmr@alice.UUCP Newsgroups: comp.arch Subject: Re: String lengths Message-ID: <8876@alice.UUCP> Date: 6 Feb 89 06:56:52 GMT Organization: AT&T Bell Laboratories, Murray Hill NJ Lines: 59 The question arose: why does C use a terminating character for strings instead of a count? Discussion of the representation of strings in C is not fruitful unless it is realized that there are no strings in C. There are character arrays, which serve a similar purpose, but no strings. Things very deep in the design of the language, and in the customs of its use, make strings a mess to add. The intent was that the behavior of character arrays should be exactly like that of other arrays, and the hope was that stringish operations on these character arrays should be convenient enough. The interplay of pointers and arrays, and the possible representations for them, place strong contraints on what one can do if one wants real strings (counted sequences of characters) in the context of the existing language, in particular if types char* or char[] are going to be counted strings. In general it is hard to account for the space in which to put the count, and also to make sure that it can be updated properly under all operations. For example, 'sizeof' is used for allocation and it is hard to make this use compatible with a count. Similarly, in practice, most implementations make 'struct { char s1[3]; s2[5]; }' say something about the storage layout that doesn't mix well with a count. Given the explicit use of character arrays, and explicit pointers to sequences of characters, the conventional use of a terminating marker is hard to avoid. The history of this convention and of the general array scheme had little to do with the PDP-11; it was inherited from BCPL and B. Of course, it is possible to imagine adding a primitive string type to C, and to put in some useful operations like concatenation, search, and substring. This would not really be a good idea, because this new primitive type would continually be at war with the existing character pointers and arrays. In the context of C (even with ANSI function prototypes) it would be quite difficult to make a string type usable in all the places it should be. In extensible languages like C++ and of course in languages in which the notion is designed in from the start, strings are fine. (However, even in C++, where it is readily possible to define your own string class, it would take quite a lot of work to make this class work smoothly with the entire existing library). In my opinion, C's array/pointer scheme for representation of character strings has worked out reasonably well, although it is somewhat clumsy when there are lots of string operations. I don't think it has been demonstrated that the usual run of C programs pays an extremely high cost in performance for their string operations, though doubtless there are counterexamples for particular machine architectures or particular programs. Dennis Ritchie att!research!dmr dmr@research.att.com