Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 6/24/83; site rabbit.UUCP Path: utzoo!watmath!clyde!burl!ulysses!allegra!alice!rabbit!wolit From: wolit@rabbit.UUCP (Jan Wolitzky) Newsgroups: net.puzzle Subject: Re: how many words can you make from gorilla? Message-ID: <2733@rabbit.UUCP> Date: Thu, 19-Apr-84 12:04:18 EST Article-I.D.: rabbit.2733 Posted: Thu Apr 19 12:04:18 1984 Date-Received: Fri, 20-Apr-84 01:40:17 EST Organization: AT&T Bell Laboratories, Murray Hill Lines: 43 The question posed was: how would you write a program to find all the English words that can be made from any given word. For instance, from the letters of "gorilla" you can form "grill," "rail," etc. The problem can be expressed as the combination of two easily solved sub-problems. Consider the original word to be a set containing 'n' distinct elements, one representing each letter in the word (note that we consider the two 'l's to be distinct). The first problem is to form the (2**n)-2 distinct subsets of this set (we are not interested in the empty string, nor in the original string itself). This can be done by a simple combinatorial program. Note that this step does not produce every possible word formable from the original, since not all permutations are produced (for example, from "gorilla" we would produce one word containing 'r', 'a', 'i', and 'l', but we would not get both "rail" and "lair" -- in fact, we might not get either, but "lria" instead). The next problem is to find all the English words that are anagrams of each string produced in the first section. Jon Bentley, in his column in the "Communications of the ACM" entitled "Programming Pearls," addressed this question in the August, September, and October issues last year. The scheme is as follows: Preprocess your dictionary by tagging each word with a string formed from the letters of that word arranged in alphabetical order. This maps words that are anagrams of each other into identical strings. Sort the dictionary based on these tags (this groups anagram classes together). Now do the same for the list of strings generated by the first section (note that none of these strings will be anagrams of each other). (Note, too, that this will already be done if we are careful in the design of the program that generates the substrings.) The task is now simply to go through the alphabetized lists looking for matches. The dictionary processing, of course, need only be done once; the result can be used for future word-generation searchs. The generation of candidate strings takes exponential time, depending on the length of the string, but this scheme is still much more efficient than generating all possible permutations of these strings, thus forming all possible anagrams, sorting this list, and checking against a sorted dictionary. I highly recommend Bentley's column to anyone interested in learning elegant ways to attack thorny problems. Jan Wolitzky, AT&T Bell Labs, Murray Hill, NJ