Path: utzoo!mnetor!uunet!husc6!cmcl2!brl-adm!umd5!purdue!i.cc.purdue.edu!j.cc.purdue.edu!pur-ee!uiucdcs!uxc.cso.uiuc.edu!ccvaxa!aglew From: aglew@ccvaxa.UUCP Newsgroups: comp.arch Subject: Re: Null-terminated C strings Message-ID: <28200080@ccvaxa> Date: 24 Dec 87 04:49:00 GMT References: <174@quick.COM> Lines: 54 Nf-ID: #R:quick.COM:174:ccvaxa:28200080:000:2661 Nf-From: ccvaxa.UUCP!aglew Dec 23 22:49:00 1987 ..> Strings Oh, for Christopher Sake'! Both sentinel terminated strings (of which null terminated are probably the most frequent example) and length strings are useful. Like somebody else said, there's nothing new. How about some architecturally oriented discussion?: (1) What machines have had support for various formats? (1.1) Has anybody combined the two? Myself, I tend to like dope strings for maximum length AND null byte. (2) For the RISCers, what simple instruction sequences best support the various formats? Eg. somebody from ACORN described the general XOR trick to find the last byte in a string. What are the appropriate tricks for length strings and dope strings? (2.1) What simple tricks on a vector machine make string manipulation easier? Myself, I tend to lean towards NO explicit string copy, etc. instructions. But, if the memory system has separate byte enable lines, something like STORE-REGISTER-BYTES-UNDER-MASK, with simple sequences for generating the mask from your sentinel and/or your length. This leads to some possibilities: (3) How do you encode the mask? 1 bit per byte, or a number saying the leading or trailing N bytes, or what? Myself, I prefer the mask to be all bits, with an error if some byte in the mask is not all 1s or 0s. But then, I want a bit addressed machine eventually. (3.1) Is it worthwhile having special instructions to go from a word conating several bytes to a mask containing bits set for all the bytes up to the first null byte? Myself, I don't think so. A bit of logical arithmetic gives it to us in a few instructions. However, there is a real tradeoff here. If you need a few instructions on every register-full of bytes, then you have to have a larger register containing more bytes before doing mass string moves through large packets becomes worthwhile. All this is in aid of the goal of moving as many characters as possible in a single memory access, plus optimizing handling the heads and tails of strings, rather than the middles, since most strings that my programs manipulate are small. Are these worthwhile goals? Give me numbers: (4) What are the distributions of lengths, (mis)alignments, etc., for frequently used string operations? Andy "Krazy" Glew. Gould CSD-Urbana. 1101 E. University, Urbana, IL 61801 aglew@mycroft.gould.com ihnp4!uiucdcs!ccvaxa!aglew aglew@gswd-vms.arpa My opinions are my own, and are not the opinions of my employer, or any other organisation. I indicate my company only so that the reader may account for any possible bias I may have towards our products.