Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!newstop!texsun!convex!convex.COM
From: tchrist@convex.COM (Tom Christiansen)
Newsgroups: comp.lang.perl
Subject: Re: uniq'ing arrays
Message-ID: <109104@convex.convex.com>
Date: 21 Nov 90 15:59:29 GMT
References: <1990Nov21.021344.16038@fxgrp.fx.com>
Sender: news@convex.com
Reply-To: tchrist@convex.COM (Tom Christiansen)
Organization: CONVEX Software Development, Richardson, TX
Lines: 49

In article <1990Nov21.021344.16038@fxgrp.fx.com> grady@postgres.berkeley.edu writes:
=grep is a useful unix utility that has a parallel perl operator.
=What about uniq?  I'd like to see uniq (or something) that strips
=out the duplicates in an array.  It would be cool if it were
=built into perl.

=Meanwhile, has anyone written an efficient routine to do this?
=Linear time would be nice.  I'd like one that doesn't require the
=array to be sorted.  I could write one myself, but I thought
=I'd avoid duplicating code..

This is turning into a FAQ, isn't it?

In <9952@jpl-devvax.JPL.NASA.GOV> on 12 Oct 90, Larry wrote a good 
article on this.  He covers these case:

1) If @in is sorted:

    $prev = 'nonesuch';
    @out = grep($_ ne $prev && (($prev) = $_), @in);

2) If we don't know whether @in is sorted:

    undef %saw;
    @out = grep(!$saw{$_}++, @in);

3) If we don't know if @in is sorted, nor case whether @out is:

    undef %ary;
    @ary{@in} = ();
    @out = keys(%ary);

(I usually use #3.)

And then he points out that if you know that @in contains
only small positive integers, you can use:

    @out = grep(!$saw[$_]++, @in);

for case 2, and for case 3:
    
    @ary[@in] = @in;
    @out = sort @ary;


I guess I'll add it to the FAQ.


--tom