Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!sdd.hp.com!cs.utexas.edu!sun-barr!newstop!texsun!convex!usenet
From: tchrist@convex.COM (Tom Christiansen)
Newsgroups: comp.unix.questions
Subject: Re: Eliminating Duplicate Mail Headers
Message-ID: <1991May01.234739.25672@convex.com>
Date: 1 May 91 23:47:39 GMT
References: <13@oss670.UUCP> <1669@aupair.cs.athabascau.ca>
Sender: usenet@convex.com (news access account)
Reply-To: tchrist@convex.COM (Tom Christiansen)
Organization: CONVEX Software Development, Richardson, TX
Lines: 62
Nntp-Posting-Host: pixel.convex.com

From the keyboard of lyndon@cs.athabascau.ca (Lyndon Nerenberg):
:[ Tried mailing this but oss670.uucp was unknown to us ]

right, me too.

:In comp.mail.headers you write:
:
:>I'm not able to fix the mailer myself, but can pass its output
:>through standard filters--awk, sed, etc.--before it goes
:>out the door.  My first thought was to pass things through 'uniq',
:>but this would also delete consecutive identical lines in the body (the
:>mailer doesn't distinguish between header and body).  The probability
:>of consecutive, identical lines in the body of mail messages seems
:>low, but not low enough to chance this.
:
:You almost answered your own question :-)
:
:Use sed to split the headers and body into seperate files. Run the header
:file through sort|uniq, then append the body file. Note that you will 
:have to deal with header continuation lines somehow. A short piece of
:C code should handle folding the headers, and unfolding them when you're
:done.

That's a lot of work!!


:Perhaps the easiest way to deal with this would be to write the entire
:filter in C. All you need to do is maintain a linked list of headers
:you have seen. During the scanning phase, if you encounter a header that's
:already on the linked list, ignore it (and any possible continuation
:lines). If it's a new header, start up a second linked list of lines
:containing the header contents. If there are continuation lines in the
:header, simply append them to the linked list for that header. This
:eliminates the need to fold/spindle/mutilate the header continuation
:lines.

:Once you've fallen out of the headers, just copy the message body
:through and you're done!

That's a HELLUVA lotta work!

Here's an awk solution:

    #!/bin/awk -f
    /^$/ { body = 1 }
    {
        if (!body) {
            if (lastline == $0) next
            lastline = $0
        }
        print
    }

And here's a perl solution:

    perl -ne 'print if (/^$/ .. eof)  || $lastline ne $_; $lastline = $_'


If you want solutions for non-consecutive or especially multi-line
headers, ask, but I can lay odds they'll be in perl. :-)

--tom