Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.1 6/24/83; site rabbit.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!allegra!alice!rabbit!wolit
From: wolit@rabbit.UUCP (Jan Wolitzky)
Newsgroups: net.puzzle
Subject: Re: how many words can you make from gorilla?
Message-ID: <2733@rabbit.UUCP>
Date: Thu, 19-Apr-84 12:04:18 EST
Article-I.D.: rabbit.2733
Posted: Thu Apr 19 12:04:18 1984
Date-Received: Fri, 20-Apr-84 01:40:17 EST
Organization: AT&T Bell Laboratories, Murray Hill
Lines: 43

The question posed was: how would you write a program to find all the
English words that can be made from any given word.  For instance,
from the letters of "gorilla" you can form "grill," "rail," etc.

The problem can be expressed as the combination of two easily solved
sub-problems.  Consider the original word to be a set containing 'n' 
distinct elements, one representing each letter in the word (note that 
we consider the two 'l's to be distinct).  The first problem is to
form the (2**n)-2 distinct subsets of this set (we are not interested in
the empty string, nor in the original string itself).  This can be done 
by a simple combinatorial program.  Note that this step does not
produce every possible word formable from the original, since not all
permutations are produced (for example, from "gorilla" we would
produce one word containing 'r', 'a', 'i', and 'l', but we would not
get both "rail" and "lair" -- in fact, we might not get either, but
"lria" instead).

The next problem is to find all the English words that are anagrams of
each string produced in the first section.  Jon Bentley, in his
column in the "Communications of the ACM" entitled "Programming Pearls,"
addressed this question in the August, September, and October issues
last year.  The scheme is as follows:  Preprocess your dictionary
by tagging each word with a string formed from the letters of that word 
arranged in alphabetical order.  This maps words that are anagrams 
of each other into identical strings.  Sort the dictionary based on
these tags (this groups anagram classes together).  Now do the same
for the list of strings generated by the first section (note that none
of these strings will be anagrams of each other).  (Note, too, that
this will already be done if we are careful in the design of the 
program that generates the substrings.) The task is now simply to go 
through the alphabetized lists looking for matches.

The dictionary processing, of course, need only be done once; the
result can be used for future word-generation searchs.  The generation
of candidate strings takes exponential time, depending on the length
of the string, but this scheme is still much more efficient than generating 
all possible permutations of these strings, thus forming all possible
anagrams, sorting this list, and checking against a sorted dictionary.

I highly recommend Bentley's column to anyone interested in learning
elegant ways to attack thorny problems.

	Jan Wolitzky, AT&T Bell Labs, Murray Hill, NJ