Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!mcsun!hp4nl!star.cs.vu.nl!maart
From: maart@cs.vu.nl (Maarten Litmaath)
Newsgroups: comp.lang.perl
Subject: Re: Longest word composed of Unix commands
Message-ID: <8511@star.cs.vu.nl>
Date: 15 Dec 90 02:00:08 GMT
References: <77888@iuvax.cs.indiana.edu>
Sender: news@cs.vu.nl
Reply-To: maart@cs.vu.nl (Maarten Litmaath)
Organization: VU Dept. of Computer Science, Amsterdam, The Netherlands
Lines: 150

In article <77888@iuvax.cs.indiana.edu>,
	sahayman@iuvax.cs.indiana.edu (Steve Hayman) writes:
)
)Have you ever wondered what the longest word is that can be spelled
)with consecutive Unix commands?  (i.e. "fingertip" = "finger" + "tip")
)You have?  Well, stop worrying.  Here's a script that will find them
)by seeing which words in /usr/dict/words can be spelled via
)combinations of commands in /bin:/usr/bin:/usr/ucb 

Nice indeed, Steve!
I've changed your script in a few ways, though:

	- /etc and /usr/etc are now searched too, which leads to the
	  next change
	- it's checked if an entry is really an executable
	- double entries (from different directories) are removed

	and most importantly

	- it's shown HOW each word can be broken up into UNIX commands!

Some words have more than 1 `representation' in the `UNIX vector space'.
Example:
	view
	vi-e-w

I don't have much experience with Perl yet, so my version of the script
may be improved too.
Here's some output:

	Ac-ta-e-on
	Cal-cut-ta
	Sh-ar-on
	Sh-e-ld-on
	Wall-ac-e
	W-ar-sa-w
	ac-comm-od-at-e
	as-sum-e
	as-tr-id-e
	clear-head-ed
	col-line-ar
	e-du-cat-e
	e-man-at-e
	enroll-e-e
	ex-e-cut-e
	id-e-at-e
	man-at-e-e
	on-e-time
	pr-e-sum-e
	pr-e-tty
	refer-e-e
	sed-at-e
	su-cc-e-ed
	test-at-e
	time-sh-ar-e
	tr-e-as-on
	w-ar-head
	w-ar-time
	w-at-e-rsh-ed

Here's the new script:

--------------------cut here--------------------
#!/usr/local/bin/perl
# unixword v2.0
# find the words in /usr/dict/words that can be constructed
# out of unix commands.  sort alphabetically.
# show how each word can be constructed from which commands.
# /etc and /usr/etc are now searched too.
#
# v1.0 by steve hayman, dec 12/1990
# v2.0 by maarten litmaath, dec 15/1990

@dirs = ( '/bin', '/usr/bin', '/usr/ucb', '/etc', '/usr/etc');
$wordlist = '/usr/dict/words';

# step 1: get a list of executables in the various directories
# step 2: leave out all entries containing non-alphabetic characters
# use an associative array to get rid of duplicate entries

foreach $dir ( @dirs ) {
    opendir(DIR, $dir) || die "Can't opendir $dir: $!";
    foreach $f (readdir(DIR)) {
	$ent = $dir . '/' . $f;
	if ($f !~ /\W|_|\d/ && -x $ent && ! -d $ent) {
	    $files{$f} = 0;
	}
    }
    close(DIR);
}

@files = keys(%files);

# step 3: construct a suitable regular expression matching
# all these filenames

$re = '^(' . join("|", @files) .  ')+$' ;

# step 4: match the dictionary file against this pattern; store words that
# match the pattern - assoc. array indexed by word, containing word len.

open(DICT, $wordlist) || die "Can't open $wordlist: $!";

while ( <DICT> ) {
    chop;
    $len{$_} = length if /$re/io;
}

# breakup() returns an array of all possible `breakups' of its argument
# example for `abcd':
# a-b-c-d
# a-b-cd
# a-bc-d
# a-bcd
# ab-c-d
# ab-cd
# abc-d
# abcd

sub breakup {
	local($word) = @_;
	local(@L) = 1 .. length($word) - 1;
	local(@ans, @sufs, $pre, $prelen, $suf);

	for $prelen (@L) {
		$pre = substr($word, 0, $prelen);
		@sufs = &breakup(substr($word, $prelen));
		foreach $suf (@sufs) {
			push(@ans, $pre . '-' . $suf);
		}
	}
	push(@ans, $word);
	@ans;
}

$brkupre = '^(-' . join("|-", @files) .  ')+$' ;

# step 5: print word list alphabetically, show how each word can be
# broken up

foreach $word ( sort keys %len ) {
	@tries = &breakup($word);
	foreach $try (@tries) {
		print "$try\n" if "-$try" =~ /$brkupre/io;
	}
}
--
In the Bourne shell syntax tabs and spaces are equivalent almost everywhere.
The exception: _indented_ here documents.  :-(
Does anyone remember the famous mistake Makefile-novices often make?