Xref: utzoo comp.unix.questions:26585 comp.lang.perl:2803 Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!usc!ucsd!ucbvax!iwarp.intel.com!news From: merlyn@iwarp.intel.com (Randal Schwartz) Newsgroups: comp.unix.questions,comp.lang.perl Subject: Re: Need help ** removing duplicate rows ** Message-ID: <1990Oct31.003627.641@iwarp.intel.com> Date: 31 Oct 90 00:36:27 GMT References: <1990Oct30.234654.23547@agate.berkeley.edu> Sender: news@iwarp.intel.com Reply-To: merlyn@iwarp.intel.com (Randal Schwartz) Organization: Stonehenge; netaccess via Intel, Beaverton, Oregon, USA Lines: 32 In-Reply-To: c60b-3ac@web.berkeley.edu (Eric Thompson) In article <1990Oct30.234654.23547@agate.berkeley.edu>, c60b-3ac@web (Eric Thompson) writes: | I have a few very long files that contain rows of ASCII data. Each row | looks something like this (not the actual data here): | | a:A:b:c:d:e:f:g:h:i:j:k:l:m | a:B:b:c:d:e:f:g:h:i:j:k:l:m | a:C:b:c:d:e:f:g:h:i:j:k:l:m | a:D:b:c:d:e:f:g:h:i:j:k:l:m | b:A:n:o:p:q:s:t:u:v:w:x:y:z | c:A:x:a:x:b:x:c:d:a:m:l:v:x | d:A:m:l:k:j:i:h:g:f:e:d:c:b | d:B:m:l:k:j:i:h:g:f:e:d:c:b | d:C:m:l:k:j:i:h:g:f:e:d:c:b | | It's the second column that's important. If there are multiple rows that | are exactly the same except for the second column, I want to GET RID of them. | If the row is unique (for example, the ones starting with "b" and "c" above) | then it should stay. Sounds like what I need is a way to filter out rows | that are duplicate except in the second column. A one-liner in Perl: perl -ne '($a,$b,$c) = split(":",$_,3); print unless $seen{$a,$c}++;' Fast enough? print "Just another Perl hacker," -- /=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\ | on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III | | merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn | \=Cute Quote: "Intel put the 'backward' in 'backward compatible'..."=========/