Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!udel!haven!uvaarpa!mmdf From: eichin@athena.mit.edu (Mark W. Eichin) Newsgroups: comp.lang.perl Subject: re: Splitting by character count in perl? Message-ID: <1991Mar7.002047.5121@uvaarpa.Virginia.EDU> Date: 7 Mar 91 00:20:47 GMT Sender: mmdf@uvaarpa.Virginia.EDU (Uvaarpa Mail System) Reply-To: eichin@athena.mit.edu Organization: The Internet Lines: 72 [My apologies for the tutorial style below; I'm writing this for the reader that doesn't know perl at all, but needs to use it. I welcome technical corrections publicly, and style comments privately...] pack/unpack does exactly what you want. The man page isn't all that clear on this, though I think the Camel Book has examples which make it clear... the pack string is almost exactly analogous to the FORMAT statement in Fortran (or rather, FORTRAN, since I mean the "classic" versions as opposed to the new standards effort) to the extent that someone could probably write a translator with little difficulty. As for your particular example, the one-liner: perl -ne '@two=unpack("a78a*",$_); print "A",$two[0],"\nB",$two[1];' should do it. Data follows. unpack is taking the current line ($_) and unpacking it into a string of 78 chars and a string of "thre rest" (*) and leaving the results into an array called "two" (@two). Then it's printing the A, the first element of @two ($two[0] - arrays start at zero, like C, by default, though you can set a variable to adjust that), then the newline and the B ("\nB"), and then the second element of two (which *already* contains the trailing newline... $_ is the *entire* line, and we never did a chop to split off the newline so it is still there. Using "a78a72" would have also chopped off the newline, as it is the 151st character...) The -n wraps a loop around the whole thing, the -e indicates that we're putting the line right here instead of off in a script. I hope this helps; I didn't really want to provide a naked one-liner, thus the windy explanation. The *important* thing, of course, is that running the above line, then feeding it the following three lines of data (78 equals + 72 stars each): ==============================================================================************************************************************************ ==============================================================================************************************************************************ ==============================================================================************************************************************************ yields: A============================================================================== B************************************************************************ A============================================================================== B************************************************************************ A============================================================================== B************************************************************************ Hmmm. Double checking your note, you want the A and B *appended* as well - Ok, fine, I'll leave the above because it makes a point about newlines, and submit: perl -ne '@two=unpack("a78a72",$_); print "A",$two[0],"A\nB",$two[1],"B\n";' A==============================================================================A B************************************************************************B A==============================================================================A B************************************************************************B A==============================================================================A B************************************************************************B Items for further exploration: a) the reassembly could be done with pack. b) if the line is less than 150 columns, so will the output. I suspect the fortran code had the same problem - and that the data *doesn't* have that problem. See what pack("A78") does, and note how it would solve that problem. c) There is a substr function, but you'd have to use it twice; would that be slower? [probably, since it would still have to create the temporary values - but it might be more memory efficient, though not by enough to matter in this example.] Enjoy... _Mark_ MIT Student Information Processing Board Watchmaker Computing