Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!gatech!dcatla!itwaf From: itwaf@dcatla.UUCP (Bill Fulton [Sys Admin]) Newsgroups: comp.unix.questions Subject: Re: sed script to remove cr/lf except at paragraph breaks Keywords: sed msdos Message-ID: <19207@dcatla.UUCP> Date: 23 May 89 00:59:31 GMT References: <119@sherpa.UUCP> <1292@aplcen.apl.jhu.edu> Reply-To: dcatla!itwaf@gatech.edu (Bill Fulton [Sys Admin]) Distribution: na Organization: DCA Inc., Alpharetta, GA Lines: 28 In article <119@sherpa.UUCP> rac@sherpa.UUCP (Roger A. Cornelius) writes: > I'm in need of a sed script to remove MSDOS cr/lf (actually replace each > cr/lf combination with one space) except at the start of a paragraph. > i.e. only the cr/lf preceding a paragraph break should remain. Paragraphs > are marked only by four leading spaces and nothing else. > Here's where I am now: > [ sed script deleted] How about lex, instead? I think the lex input between these lines: ---------- %% \015\012" " ECHO; \015\012 { strcpy(yytext, " "); ECHO; } ---------- should do what you want. Make it with 'lex ; cc lex.yy.c -ll', then feed a.out your MSDOS file(s)! You could append a functions section to do setup, or you could drive it from a front-end script. I don't want to turn this into a lex vs. sed thing, but it does seem that lex would be much more direct and easy. I agree that lex is "well ... a little strange" if you don't work with it a lot, but once you start to mess around with sed scripts such as you have, it starts to balance out. Once I played with it a little, I've decided that lex is pretty neat as a standalone utility! Bill Fulton dcatla!itwaf@gatech.edu OR ..!gatech!dcatla!itwaf