Newsgroups: comp.unix.questions Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!ispd-newsserver!garden.ssd.kodak.com!weimer From: weimer@garden.ssd.kodak.com (Gary Weimer (253-7796)) Subject: Re: Eliminating Duplicate Mail Headers Message-ID: <1991May6.193538.6235@ssd.kodak.com> Sender: news@ssd.kodak.com Reply-To: weimer@ssd.kodak.com Organization: Eastman Kodak Co.; Rochester, NY References: <13@oss670.UUCP> <1669@aupair.cs.athabascau.ca> <12817@exodus.Eng.Sun.COM> Date: Mon, 6 May 91 19:35:38 GMT >I'm not able to fix the mailer myself, but can pass its output >through standard filters--awk, sed, etc.--before it goes >out the door. My first thought was to pass things through 'uniq', >but this would also delete consecutive identical lines in the body (the >mailer doesn't distinguish between header and body). The probability >of consecutive, identical lines in the body of mail messages seems >low, but not low enough to chance this. Since I haven't seen a non-perl solution that works yet, here's mine. Actually I have two (don't ask me why). The second is more robust and handles all examples in the test file. ============ Start test file ======================= This is the first line First continued line Another continued line Another continued line with extras A repeated line A repeated line A repeated line with continuation A repeated line with continuation One more line Body of message Body of message More lines 2nd paragraph Body of message Body of message More lines ============ End test file ======================= ============ Start 1st solution file ======================= #!/bin/awk -f # assumes first line is not blank (doesn't modify header if it is) # assumes continuation lines do not make a "line" unique, i.e. # A line followed by # a continuation line # is a "duplicate" of: # A line followed by # a different continuation line BEGIN{cont = " "} # tab is continuation character /^$/,//{ # //,//{ print $0; next} substr($0,1,1) == cont { # don't print continuation line if first if (!del) {print $0} # part of line was a repeat next} prev == $0 { # this and any continuation is repeat del = 1; next} { # print line since not repeat del = 0; print $0; prev = $0} ============ End 1st solution file ======================= ============ Start 2st solution file ======================= #!/bin/awk -f # skips blank lines at start of file (can be printed) # compares continuation lines BEGIN{contflg = " "} # tab is continuation character {if (!fndhdr){ # handle blank lines before header if ($0 == ""){ # print $0; # print blank lines before header next} else{ fndhdr = 1}}} /^$/,//{ # //,//{ print $0; next} substr($0,1,1) == contflg { if (nm != 0 && nm < np && prev[nm+1] == $0){ # still seams to be repeat nm++} else{ # line is not a repeat if (nm == 0){ # we already knew was not repeat np++} else{ for (i=1; i<=nm; i++) # print what we thought was a repeat print prev[nm]; np = nm + 1; nm = 0} print $0; prev[np] = $0} # keep track of continuation lines next} prev[1] == $0 { # assume line is repeat nm = 1; next} { # print line since not repeat nm = 0; print $0; np = 1; prev[np] = $0} ============ End 2st solution file =======================