Xref: utzoo comp.unix.questions:26763 comp.lang.perl:2889
Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!wuarchive!udel!rochester!kodak!ispd-newsserver!weimer
From: weimer@ssd.kodak.com (Gary Weimer)
Newsgroups: comp.unix.questions,comp.lang.perl
Subject: Re: Need help ** removing duplicate rows **
Message-ID: <1990Nov7.205644.7593@ssd.kodak.com>
Date: 7 Nov 90 20:56:44 GMT
References: <1990Oct30.234654.23547@agate.berkeley.edu> <1990Oct31.003627.641@iwarp.intel.com> <10182@jpl-devvax.JPL.NASA.GOV>
Sender: news@ssd.kodak.com
Organization: Eastman Kodak
Lines: 28

In article <10182@jpl-devvax.JPL.NASA.GOV> lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) writes:
>In article <1990Oct31.003627.641@iwarp.intel.com> merlyn@iwarp.intel.com (Randal Schwartz) writes:
>: In article <1990Oct30.234654.23547@agate.berkeley.edu>, c60b-3ac@web (Eric Thompson) writes:
>: | Sounds like what I need is a way to filter out rows
>: | that are duplicate except in the second column.
>: 
>: A one-liner in Perl:
>: 
>: perl -ne '($a,$b,$c) = split(":",$_,3); print unless $seen{$a,$c}++;'
>: 
>: Fast enough?
>
>Maybe, but he said they were very long files, and that may mean more than
>you'd want to store in an associative array, even with virtual memory.
>Presuming the files are sorted reasonably, you can get away with this:
>
>perl -ne '($this = $_) =~ s/:[^:]*//; print if $this ne $that; $that = $this'
>
>Of course, someone will post a solution using cut and uniq, which will be
>fine if you don't mind losing the second field.  Or swapping the first
>two fields around.  I'll leave the awk and sed solutions to someone else.

Who needs sed?

awk -F: '{cur=$1$3$4$5$6$7$8$9$10$11$12$13$14;if(cur!=prev){prev=cur;print $0}}'
InFile > OutFile

NOTE: split to fit in 80 columns--needs rejoined