Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!wuarchive!cs.utexas.edu!uunet!munnari.oz.au!cs.mu.oz.au!ok From: ok@cs.mu.oz.au (Richard O'Keefe) Newsgroups: comp.lang.c Subject: Re: ambiguous ? Message-ID: <2522@munnari.oz.au> Date: 25 Oct 89 05:50:21 GMT References: <11398@smoke.BRL.MIL> <14115@lanl.gov> Sender: news@cs.mu.oz.au Lines: 83 In article <14115@lanl.gov>, jlg@lanl.gov (Jim Giles) writes: > From article <11398@smoke.BRL.MIL>, by gwyn@smoke.BRL.MIL (Doug Gwyn): > > It looks to me like Bill Wells was merely stating the facts that are > > apparent to him. Your inane comments about C's character processing > > being less efficient (than in other programming languages) back him > > up in his assessment. > I don't see how. C character strings are null terminated rather > that keeping the length of the string explicitly. The result of this > is that hardware with specialized instructions for character processing > cannot be used as efficiently in C as with other languages. The C > strings always have to be prescanned to determine their length before > the operation you are _really_ interested in can be performed. You are both right. C is an exceptionally good language for CHARACTER processing. C is a rather bad language for STRING processing. I have an anecdote: a friend of mine spent a couple of months writing a fairly "batch" editor in PL/I to run on an IBM mainframe. He made extensive use of PL/I's CHARACTER(LENGTH) VARYING, that's what the type is for, right? (BEGIN DIGRESSION: #define DclCharVar(Vbl, N) struct { \ int curlen; \ char curtxt[N]; \ } Vbl = {0} #define DclCharCon(Vbl, S) struct { \ int curlen; \ char curtxt[-1 + sizeof S]; \ } Vbl = {-1 + sizeof S, S} DclCharVar(a, 20); DclCharCon(b, "A literal value"); gives you roughly the same effect as DCL A CHAR(20) VARYING, B CHAR(15) INITIAL("A literal value"); in PL/I, except that you're still missing the library of built in functions and the compiler optimisations. END DIGRESSION) It was extremely painful for him to do this; none of the built in functions was quite what he wanted, and if you wrote your own string functions they ran an order of magnitude slower than the built in ones. (Your functions did not exploit special hardware and the compiler didn't know how to optimise them.) As an exercise in learning C, I implemented the same editor on a PDP-11/60 running V6+ UNIX. It took me 3 days to write and 2 more to debug, and was about 12 pages long compared with my friend's 60. It was also faster on the 11/60 than my friend's program on an IBM 4331 (I think it was a 4331; might have been bigger). What happened? All things considered, I think my friend was a better programmer than I was. The point was that he was starting from a STRINGS language where the built-in functions were fast but anything else was hard, whereas I was starting from a CHARACTERS language where I was able to synthesise precisely the operations that I needed (3 pages to define the ``string'' functions I wanted). If you insist on seeing array-of-byte-with-NUL-terminator as *THE* equivalent in C of strings, you are going to be in big trouble. The C library actually supports THREE different representations: unbounded-array-of-byte-with-NUL terminator (str* functions) at-most-N-bytes-with-NUL-terminator-if-short (strn* functions) array-of-exactly-N-bytes (mem* functions) VMS C programmers use a fourth representation, "descriptor", which has extensive support in the VMS runtime library. C provides direct syntax for literals of only one type, but as I showed above, it isn't hard to come up with macros to declare named constants of the other types. (VMS C already has such a macro for "descriptors".) If you insist on doing text processing with strings, you are making a pretty big mistake no matter what language you are using. For example, in Lisp and Prolog I have been able to reduce program costs from O(N**2) to O(N) by switching from "string" representation of character sequences to linked lists. In general, it is wise to use "implicit" representations for character sequences where you can. Instead of constructing strings and writing them out, construct trees of some sort and have a tree-walker that sends out the characters without putting them in a string.