Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!decwrl!elroy.jpl.nasa.gov!jpl-devvax!lwall From: lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) Newsgroups: comp.unix.shell Subject: Re: Breaking large file into pieces Message-ID: <9486@jpl-devvax.JPL.NASA.GOV> Date: 13 Sep 90 00:54:35 GMT References: <1990Sep11.134238.20218@dg-rtp.dg.com> <1990Sep11.200555.14626@iwarp.intel.com> <26116@boulder.Colorado.EDU> Reply-To: lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) Organization: Jet Propulsion Laboratory, Pasadena, CA Lines: 46 In article <26116@boulder.Colorado.EDU> skwu@spot.Colorado.EDU.Colorado.EDU (WU SHI-KUEI) writes: : The right tool for the job is NOT perl but 'csplit'. "Those words fall too easily from your lips." --Gandalf Let us attempt to distinguish fact from dogma. 1) As far as I can tell, csplit is AT&T proprietary. I certainly don't have it on all my machines, and don't know offhand where I'd find the source for it. The person we were advising may well not have it on his machine. You should at least say "If you have csplit..." 2) The man page for csplit (in the AT&T universe of a Pyramid, anyway) indicates that you can have a maximum of 99 output files. The application in question could easily have more than that, judging by how it was specified. A general tool should not have such limitations. 3) csplit won't name the files in the way specified--you'd have to follow it up with a loopful of mv commands, one process per file. And in the naive implementation, you'd have a sed or awk for each file to extract out the filename to hand to mv. 4) csplit can't recognize patterns across newlines (not that this job required that, but a general tool shouldn't have such limitations.) 5) csplit can get confused on lines longer than 255 chars. It can't handle embedded nulls. A general tool should not have such limitations. 6) Even if I did manage to find a freely available source for csplit, I'd have to worry about recompiling it on all my different architectures. That would be okay (after all, I have to do that with Perl too), but I have to do it for 50 blue jillion other little "must have" tools too. I'd much rather compile Perl once on each architecture, rewrite csplit in Perl, throw it into my /u/scripts directory that's mounted everywhere, and never worry about recompiling csplit again. So it's not quite so simple as all that. You can chop down a tree with a hatchet, but sometimes you want an industrial strength Swiss Army Chainsaw. And sometimes not. There's more than one way to do it. Larry