Path: utzoo!attcan!uunet!jarthur!elroy.jpl.nasa.gov!jpl-devvax!lwall From: lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) Newsgroups: comp.unix.questions Subject: Re: Stripping "hard returns" from UNIX mail files Message-ID: <10157@jpl-devvax.JPL.NASA.GOV> Date: 29 Oct 90 21:40:31 GMT References: <1990Oct23.135606@casbs.Stanford.EDU> <657182670.22483@ontmoh.UUCP> Reply-To: lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) Organization: Jet Propulsion Laboratory, Pasadena, CA Lines: 54 In article <657182670.22483@ontmoh.UUCP> peter@ontmoh.UUCP (Peter Renzland) writes: : patrick@casbs.Stanford.EDU (Patrick Goebel) asks for a UNIX utility : to remove "hard returns" from mail messages for subsequent processing : by MS-DOS wordprocessors. : : Unix considers it natural for text to be made up of lines, and all : programs that do useful things with text assume that such lines are : within some reasonable limit. Painting with a broad brush here, aren't you? Both Gnu emacs and Perl agree that the only "reasonable limit" on line length is the amount of swap space available on your machine. : This corresponds to things that naturally : contain lines (text in books or on your display, or on typewriter, or : a line printer), and those things, naturally, have limits on the line : length. : : The RETURN key, and its code, is an implementation of the typewriter's : "carriage" return. Fair enough. But someday we have to escape the typewriter/punchcard metaphor. Word processors are just beginning to get us out of this straitjacket. : Text which is thus made up of lines can easily be formatted in all sorts : of ways. But, if we format it so that we have (limitless) multiline : paragraphs and no longer any line separators, some of our programs that : are so handy with lines of text may break in the face of possibly huge : paragraphs. So rewrite the programs so they aren't busted. : Having said that, you could try something like this little program: : : awk ' : NF==0 { if(LINE) { print LINE ; LINE="" } ; print ; next} : { if(LINE) LINE=LINE " " $0 ; else LINE=LINE $0 } : END { print LINE } : ' $* I think gawk will now handle "infinite" lines, but older awks will blow up on longer paragraphs. It would be a tad nicer if it threw in an extra space after lines that end a sentence. : I would prefer to use the PC wordprocessor's text import facilities to : take standard line-oriented text and convert it to its own paragraph : format. Some of use don't have such a clever importer. Yeah, I know, rewrite the programs... Sigh. Larry