Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!samsung!zaphod.mps.ohio-state.edu!ncar!elroy.jpl.nasa.gov!jpl-devvax!lwall From: lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) Newsgroups: comp.unix.shell Subject: Re: Breaking large file into pieces Message-ID: <9466@jpl-devvax.JPL.NASA.GOV> Date: 11 Sep 90 19:38:42 GMT References: <1990Sep11.134238.20218@dg-rtp.dg.com> Reply-To: lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) Organization: Jet Propulsion Laboratory, Pasadena, CA Lines: 57 In article <1990Sep11.134238.20218@dg-rtp.dg.com> monroe@dg-rtp.dg.com (Mark A Monroe) writes: : I want to rip a large file into pieces, naming new files according : to an ID string in the large file. For example, the large file contains : records that look like this: : : xxx-00001239 data data data : description : . : . : (variable length) : . : <---blank line : xxx-00001489 data data data : description : . : . : (variable length) : . : <---blank line : xxx-00001326 data data data : : When I find a line in the large data file that starts : with "xxx-0000", I want to open a file named "xxx-0000", : like "xxx-00001489", and write every line, including : the current one, into it. When I see another "xxx-0000", : I want to close the file, open a new file named for the new id : string, and continue writing. At the end of the large data : file, close all files and exit. : : Any suggestions? In standard shell+awk+sed it's a bit hard because you run out of file descriptors. You could do something like run sed over your file to turn it into a giant script of here-is commands, but that'll be real slow. You could do something like this: while read line; do case "$line" in xxx-0000*) set $line; exec >$1;; esac echo "$line" done But how well that works depends on the vagaries of your echo command, such as what it does with lines starting with '-', or containing '\c'. You don't really want to do this on a machine where echo isn't a builtin. If you have Perl, your fastest solution will be to say something like perl -pe 'open(STDOUT,">$&") if /^xxx-0000\d+/' filename Change > to >> if the keys aren't unique in your input file. Larry Wall lwall@jpl-devvax.jpl.nasa.gov