Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!ucsd!orion.cf.uci.edu!uci-ics!paris.ics.uci.edu
From: nagel@paris.ics.uci.edu (Mark Nagel)
Newsgroups: comp.arch
Subject: Re: String lengths
Message-ID: <6830@paris.ics.uci.edu>
Date: 5 Feb 89 23:19:06 GMT
References: <6133@columbia.edu> <5124@aldebaran.UUCP> <7@microsoft.UUCP> <213@nbires.nbi.com> <38@microsoft.UUCP> <219@nbires.nbi.com> <14331@cup.portal.com>
Sender: news@paris.ics.uci.edu
Reply-To: nagel@paris.ics.uci.edu (Mark Nagel)
Distribution: na
Organization: University of California, Irvine - Dept of ICS
Lines: 32
In-reply-to: PLS@cup.portal.com (Paul L Schauble)

In article <14331@cup.portal.com>, PLS@cup (Paul L Schauble) writes:
|This deserves a new subject.
|
|Since it was mentioned in the Endian Wars, does anyone know why C uses the
|null terminated string rather than an explicit length? It seems like such
|an odd choice considering that
|  - It removes a character from the character set, a source of many C
|    bugs, and
|  - All machines I know of that have character string instructions want
|    the length of the string. This forces the string primitives to first
|    scan for null, a time wasting operation.
|
|There must have been a reason. What is it?

Hmm.  There are two things going on here.  One is that you want to
have truly variable-length strings.  You can do it the C way, or you
can adopt some more complicated method like having different string
types or a variable length string length indicator.  I think the
implementors chose the simplest approach, hoping that in the average
case, the overhead from scanning a string would be small (and
hopefully the value would be cached in whatever data structure needed
it).  The other thing (once the sentinel method was chosen) was to
select the proper terminating character.  I don't think NUL is used
much anywhere for anything and thus is a good candidate.  In addition,
I've heard that NUL was chosen as a way to help prevent overrunning
the ends of strings by too much in the case of a missing end-of-string
character.  What single byte value is more prevalent in machine code
than zero?

Mark Nagel @ UC Irvine, Dept of Info and Comp Sci
ARPA: nagel@ics.uci.edu              | Charisma doesn't have jelly in the
UUCP: {sdcsvax,ucbvax}!ucivax!nagel  | middle. -- Jim Ignatowski