Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!cs.utexas.edu!samsung!usc!henry.jpl.nasa.gov!elroy.jpl.nasa.gov!jpl-devvax!lwall
From: lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall)
Newsgroups: comp.lang.perl
Subject: Re: need perl help
Message-ID: <6723@jpl-devvax.JPL.NASA.GOV>
Date: 4 Jan 90 02:12:13 GMT
References: <229@carssdf.UUCP>
Reply-To: lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall)
Organization: Jet Propulsion Laboratory, Pasadena, CA
Lines: 40

In article <229@carssdf.UUCP> usenet@carssdf.UUCP (UseNet Id.) writes:
: I would like to remove pairs of a letter from a string.  After I remove
: spaces & vowels, something like this:
:   $a =~ tr/AEIOU/     /;
:   $a =~ s/ //og;

(The o is unnecessary.)

: I then would like to remove double letters something like
:      wizzard  -->  wizard
: This all goes toward building a key to compare names, addresses, etc... to
: eliminate duplicates.
: 
: Does anyone have any ideas?   There's probably a more elegant way to remove
: the vowels and spaces too, for that matter.

Yes, use the [] construct and say s/[AEIOU ]//g; or some such.

There are several ways to remove duplicate characters, but the most concise
(and probably the fastest) is to say

	$a =~ s/(.)\1/$1/g;

This does have the problem that it doesn't reduce three in a row, but

	while ($a =~ s/(.)\1/$1/g) {}

will fix that.  You ought to be able to say

	$a =~ s/(.)\1+/$1/g;

but you'll get a complaint about "regexp *+ operand could be empty".
Now that I think on it, you can say

	$a =~ s/(.)\1\1?\1?/$1/g;

which will translate up to 4 duplicate chars.  How thorough do you want
to get?

Larry