Path: utzoo!censor!geac!torsqnt!news-server.csri.toronto.edu!rutgers!cs.utexas.edu!tut.cis.ohio-state.edu!snorkelwacker.mit.edu!hsdndev!cmcl2!kramden.acf.nyu.edu!brnstnd From: brnstnd@kramden.acf.nyu.edu (Dan Bernstein) Newsgroups: comp.lang.perl Subject: Re: Fast way to join lines ? Message-ID: <10279:Dec220:47:5990@kramden.acf.nyu.edu> Date: 2 Dec 90 20:47:59 GMT References: <9830001@hpfcso.HP.COM> <2967:Dec122:39:3790@kramden.acf.nyu.edu> <109688@convex.convex.com> Organization: IR Lines: 157 In article <109688@convex.convex.com> tchrist@convex.COM (Tom Christiansen) writes: > In article <2967:Dec122:39:3790@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > :In article <9830001@hpfcso.HP.COM> hai@hpfcso.HP.COM (Hai Vo-Ba) writes: > :> I am using perl to join every N lines of a very large file > :> together and wonder what is the faster way to do this: [ some code ] > Dan then writes: > :The faster way is something like this: [ a similar amount of C code ] > Well, I'm afraid we've got just a few problems here. Before I respond to your comments, here's a very quick essay on the point of comp.lang.perl. A couple of weeks ago, someone posted a 100-line program to comp.lang.c and asked the net to debug it for him. Doug Gwyn gently reminded him that just because a programming problem happens to be in C doesn't mean it has anything to do with comp.lang.c. He then answered the question anyway, pointing out the bugs in the program. Henry Spencer also went out of his way to say how inappropriate the posting was. Some months back, several people said in news.groups that the proposed comp.lang.perl would carry a lot of inappropriate content. Somebody even suggested comp.sources.perl. They were right. Although Larry, Randal, and Tom try hard to make it otherwise, this group is flooded with articles having no more to do with the Perl language than that debugging problem had to do with C. There aren't many groups dedicated to programming as an issue in itself. Sure, there's comp.unix.programmer for UNIX-specific programming, and rec.games.programmer for games programming, and comp.software-eng for theoretical crap. But there's nowhere a programmer can turn if he wants to get advice on a programming problem that doesn't have to do with UNIX, or games, or whatever. What happens? Joe Shmoe says ``It's in C! So I'll post to comp.lang.c!'' Or ``It's under UNIX! So I'll post to comp.unix.programmer!'' Or ``It's in Perl! So I'll try comp.lang.perl!'' So much for the essay. What happened here? Someone posted a problem to comp.lang.perl. I looked at it and said ``Oh, he's reading in lines, and taking the line number mod 32. But he's not manipulating the lines. So why doesn't he just copy the input to the output, leaving off the newlines except every 32nd? And he should use a rotating counter instead of taking mods, so he can handle any size file, and doesn't have to worry about machines with slow division.'' In other words, I treated it as a programming problem. I responded likewise. At first I just wrote what I had above. Then I decided that a program would make more effective exposition than English text. Since I program better in C than in Perl, I stuck to C. Did that make my article appropriate for comp.lang.c? No. A general programming problem is never appropriate for a language newsgroup, unless some language feature greatly affects the coding technique. Just because something is C code doesn't make it right for comp.lang.c. And just because something is in Perl doesn't make it right for comp.lang.perl. > The first one is that the C code doesn't do what the Perl code does, [ the Perl version can read files, my version reads stdin ] BFD. He can cat the original files, then pipe them through the filter. Or on any system he can add five lines of argument processing to the program. (I've been trying to convince Berkeley to add a library for this job to BSD. If they do, it could be standard by 1997... [grin]) (It is true that the code does something different, btw: it runs t in a cycle of length 33, to point out that a rotating counter is always fast, while % or & won't be very fast if the cycle length isn't a power of 2. Of course, an optimizer with really smart reduction could do this transformation for itself.) > The second problem is that (as I mentioned before) while it's good to > maintain perspective of using the right tool for the job at hand, this > *IS* comp.lang.perl, and the poster seemed to be clearly searching for a > perlian solution to his problem. Read my essay above. I'm quite sure that if I had used words (``Just copy characters to the output. Rotate a counter on each newline; only print the newline if the counter is 0.'') you wouldn't be complaining. Now you're offended because I decided that C code would illustrate this more effectively than words? When someone posts an article in English, some Germans get a translator. Those who can read English appreciate that the poster had something to say, and didn't know how to say it as effectively in German as in his native language. Those who were also brought up in English don't even think about the choice of language; they just pay attention to the point at hand. So what if the article was posted to alt.prose? > Look at it this way: if I hung around comp.lang.c and kept posting Perl > solutions to people's C questions, it would eventually grate on people's > nerves. No: it grates on people's nerves when people ask general programming problems in comp.lang.c. But once a thread has been established in the wrong group, it's more polite to stick to that decision than to split off into another inappropriate group. > We've had a very flame-free, productive little > group here since its inception, so let's keep it that way, OK? You're the one who started flaming. I was just answering a programming question. > The third problem is that the poster asked for a faster way. There are > several interpretations of faster, including but not limited to faster > writing time, faster compile time, faster debugging time, and faster run > time. That's not a problem with my code; it's a problem with your definition of ``faster.'' The only objective measure is faster run time, and by that measure I did answer the question. Compile time depends on what you mean by ``compile''---I'd say Perl keeps compiling every time you use the code, while you only need to compile C when you change it. Writing time and debugging time are quite subjective---for you, a Perl solution may be faster to write, but for me, a C solution is faster. Why not stick to the objective terms? [ ... ] > perl -pe 'chop if $. % 32' > or else > perl -pe 'chop if $. & 31' Now that's showing people how to use Perl more effectively. But suppose I had seen your answer first, and wanted to say that (at least in most languages) it's faster for to process characters instead of lines? The original question was about making the program run faster. Petty Perl optimizations are cute, but an improved algorithm is more effective. [ C: 0.450524 real 0.340401 user 0.054859 sys ] [ Perl: 0.684193 real 0.450535 user 0.083110 sys ] I agree, 50% slower is respectable for a general tool. But it's still 50% slower. > As far as I'm concerned, and I'll bet you this goes for most of the rest > of the readership of this newsgroup as well, anything that you can express > as a quick one-liner without having to go into an editor (let alone > compile an a.out!) is worth doing that way. Yes, it's worth doing that way. But it's worth even more to recode these things in C. It took me thirty seconds from start to finish to write and compile that code. If the program is used more than 150 times, it's worth it. > Furthermore, it's a lot more legible > because its complexity is drastically reduced, which means it'll be more > maintainable as well. Oh? I look at the C program and see ``Process each character. Skip all but every 33rd newline. Copy to output.'' These C idioms are much more familiar to me than the mere definition of Perl's ``chop''. So to me, and probably to lots of other C programmers, the C code is much more maintainable. ---Dan Brought to you by Super Global Mega Corp .com