Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!ucsd!orion.cf.uci.edu!uci-ics!paris.ics.uci.edu From: nagel@paris.ics.uci.edu (Mark Nagel) Newsgroups: comp.arch Subject: Re: String lengths Message-ID: <6830@paris.ics.uci.edu> Date: 5 Feb 89 23:19:06 GMT References: <6133@columbia.edu> <5124@aldebaran.UUCP> <7@microsoft.UUCP> <213@nbires.nbi.com> <38@microsoft.UUCP> <219@nbires.nbi.com> <14331@cup.portal.com> Sender: news@paris.ics.uci.edu Reply-To: nagel@paris.ics.uci.edu (Mark Nagel) Distribution: na Organization: University of California, Irvine - Dept of ICS Lines: 32 In-reply-to: PLS@cup.portal.com (Paul L Schauble) In article <14331@cup.portal.com>, PLS@cup (Paul L Schauble) writes: |This deserves a new subject. | |Since it was mentioned in the Endian Wars, does anyone know why C uses the |null terminated string rather than an explicit length? It seems like such |an odd choice considering that | - It removes a character from the character set, a source of many C | bugs, and | - All machines I know of that have character string instructions want | the length of the string. This forces the string primitives to first | scan for null, a time wasting operation. | |There must have been a reason. What is it? Hmm. There are two things going on here. One is that you want to have truly variable-length strings. You can do it the C way, or you can adopt some more complicated method like having different string types or a variable length string length indicator. I think the implementors chose the simplest approach, hoping that in the average case, the overhead from scanning a string would be small (and hopefully the value would be cached in whatever data structure needed it). The other thing (once the sentinel method was chosen) was to select the proper terminating character. I don't think NUL is used much anywhere for anything and thus is a good candidate. In addition, I've heard that NUL was chosen as a way to help prevent overrunning the ends of strings by too much in the case of a missing end-of-string character. What single byte value is more prevalent in machine code than zero? Mark Nagel @ UC Irvine, Dept of Info and Comp Sci ARPA: nagel@ics.uci.edu | Charisma doesn't have jelly in the UUCP: {sdcsvax,ucbvax}!ucivax!nagel | middle. -- Jim Ignatowski