Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!sdd.hp.com!cs.utexas.edu!sun-barr!newstop!texsun!convex!usenet From: tchrist@convex.COM (Tom Christiansen) Newsgroups: comp.unix.questions Subject: Re: Eliminating Duplicate Mail Headers Message-ID: <1991May01.234739.25672@convex.com> Date: 1 May 91 23:47:39 GMT References: <13@oss670.UUCP> <1669@aupair.cs.athabascau.ca> Sender: usenet@convex.com (news access account) Reply-To: tchrist@convex.COM (Tom Christiansen) Organization: CONVEX Software Development, Richardson, TX Lines: 62 Nntp-Posting-Host: pixel.convex.com From the keyboard of lyndon@cs.athabascau.ca (Lyndon Nerenberg): :[ Tried mailing this but oss670.uucp was unknown to us ] right, me too. :In comp.mail.headers you write: : :>I'm not able to fix the mailer myself, but can pass its output :>through standard filters--awk, sed, etc.--before it goes :>out the door. My first thought was to pass things through 'uniq', :>but this would also delete consecutive identical lines in the body (the :>mailer doesn't distinguish between header and body). The probability :>of consecutive, identical lines in the body of mail messages seems :>low, but not low enough to chance this. : :You almost answered your own question :-) : :Use sed to split the headers and body into seperate files. Run the header :file through sort|uniq, then append the body file. Note that you will :have to deal with header continuation lines somehow. A short piece of :C code should handle folding the headers, and unfolding them when you're :done. That's a lot of work!! :Perhaps the easiest way to deal with this would be to write the entire :filter in C. All you need to do is maintain a linked list of headers :you have seen. During the scanning phase, if you encounter a header that's :already on the linked list, ignore it (and any possible continuation :lines). If it's a new header, start up a second linked list of lines :containing the header contents. If there are continuation lines in the :header, simply append them to the linked list for that header. This :eliminates the need to fold/spindle/mutilate the header continuation :lines. :Once you've fallen out of the headers, just copy the message body :through and you're done! That's a HELLUVA lotta work! Here's an awk solution: #!/bin/awk -f /^$/ { body = 1 } { if (!body) { if (lastline == $0) next lastline = $0 } print } And here's a perl solution: perl -ne 'print if (/^$/ .. eof) || $lastline ne $_; $lastline = $_' If you want solutions for non-consecutive or especially multi-line headers, ask, but I can lay odds they'll be in perl. :-) --tom