Xref: utzoo comp.unix.questions:26763 comp.lang.perl:2889 Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!wuarchive!udel!rochester!kodak!ispd-newsserver!weimer From: weimer@ssd.kodak.com (Gary Weimer) Newsgroups: comp.unix.questions,comp.lang.perl Subject: Re: Need help ** removing duplicate rows ** Message-ID: <1990Nov7.205644.7593@ssd.kodak.com> Date: 7 Nov 90 20:56:44 GMT References: <1990Oct30.234654.23547@agate.berkeley.edu> <1990Oct31.003627.641@iwarp.intel.com> <10182@jpl-devvax.JPL.NASA.GOV> Sender: news@ssd.kodak.com Organization: Eastman Kodak Lines: 28 In article <10182@jpl-devvax.JPL.NASA.GOV> lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) writes: >In article <1990Oct31.003627.641@iwarp.intel.com> merlyn@iwarp.intel.com (Randal Schwartz) writes: >: In article <1990Oct30.234654.23547@agate.berkeley.edu>, c60b-3ac@web (Eric Thompson) writes: >: | Sounds like what I need is a way to filter out rows >: | that are duplicate except in the second column. >: >: A one-liner in Perl: >: >: perl -ne '($a,$b,$c) = split(":",$_,3); print unless $seen{$a,$c}++;' >: >: Fast enough? > >Maybe, but he said they were very long files, and that may mean more than >you'd want to store in an associative array, even with virtual memory. >Presuming the files are sorted reasonably, you can get away with this: > >perl -ne '($this = $_) =~ s/:[^:]*//; print if $this ne $that; $that = $this' > >Of course, someone will post a solution using cut and uniq, which will be >fine if you don't mind losing the second field. Or swapping the first >two fields around. I'll leave the awk and sed solutions to someone else. Who needs sed? awk -F: '{cur=$1$3$4$5$6$7$8$9$10$11$12$13$14;if(cur!=prev){prev=cur;print $0}}' InFile > OutFile NOTE: split to fit in 80 columns--needs rejoined