Path: utzoo!utgpu!utstat!jarvis.csri.toronto.edu!mailrus!csd4.milw.wisc.edu!uxc!uxc.cso.uiuc.edu!mcdurb!aglew From: aglew@mcdurb.Urbana.Gould.COM Newsgroups: comp.arch Subject: Re: String lengths Message-ID: <28200270@mcdurb> Date: 9 Feb 89 03:09:00 GMT References: <762@atanasoff.cs.iastate.edu> Lines: 56 Nf-ID: #R:atanasoff.cs.iastate.edu:762:mcdurb:28200270:000:2464 Nf-From: mcdurb.Urbana.Gould.COM!aglew Feb 8 21:09:00 1989 >/* Written 3:15 pm Feb 6, 1989 by GQ.RLG@forsythe.stanford.edu in mcdurb:comp.arch */ >->[Me] aglew@mcdurb.Urbana.Gould.COM writes: >->May I encourage people implementing string libraries to use an extra >->level of indirection? Instead of length immediately preceding the string, >->let length be associated with a pointer to the string. Makes >->substringing operations much easier, and has the ability to reduce >->unnecessary copies (at the risk of increased aliasing). >-> >-> +------+---+ >-> |length|ptr| >-> +------+---+ >-> | >-> +------+ >-> | >-> V >-> +---+---+---+---+---+---+---+---+---+---+---+---+---+ >-> | H | E | L | L | O | , | | W | O | R | L | D | \n| >-> +---+---+---+---+---+---+---+---+---+---+---+---+---+ > >Such an implementation has adverse effects when the string is sent >to/from an external device, such as a file. The 'length' must be >with the string, or the string needs a terminator character. If you are sending directly to an output device, I doubt that your output device accepts your internal format. If you have to reformat anyway... Oh, you mean storing data in a file. What's a file? You mean this memory-mapped object... Sorry, I don't live in that environment, unfortunately. Yep, you have to decide either way. For text strings, ASCII files or binary files are fine by me. Leading counts are fine. Nothing says that ptr could not point to the very next location. >what happens to the 'length' information for the old string? I sure would hope it got changed appropriately! And I sure would hope that the use was wrapped in a library routine or macro or C++ type object interface so that nobody ever accessed the length and ptr explicitly! Look, null terminated is fine by me, I use it every day. It just has the embedded null drawback, and the fact that it encourages dumb code. Several examples of which (dumb code that scans the string twice) are on my list of things to fix real soon now - one is taking up 10% of a loaded system. And, yes, good coding practices can avoid double scanning, so all that you're left with is the embedded null problem. (Talking about dumb code - has anyone else seen things like #define TERM_ESCAPE_CODE "\e[foo\0bar" puts(TERM_ESCAPE_CODE); /* Do escape code magic with terminal? */ particularly in things where the escape code is computed?) And I sure would never let any oiu