Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!mcsun!hp4nl!star.cs.vu.nl!maart From: maart@cs.vu.nl (Maarten Litmaath) Newsgroups: comp.lang.perl Subject: Re: Longest word composed of Unix commands Message-ID: <8511@star.cs.vu.nl> Date: 15 Dec 90 02:00:08 GMT References: <77888@iuvax.cs.indiana.edu> Sender: news@cs.vu.nl Reply-To: maart@cs.vu.nl (Maarten Litmaath) Organization: VU Dept. of Computer Science, Amsterdam, The Netherlands Lines: 150 In article <77888@iuvax.cs.indiana.edu>, sahayman@iuvax.cs.indiana.edu (Steve Hayman) writes: ) )Have you ever wondered what the longest word is that can be spelled )with consecutive Unix commands? (i.e. "fingertip" = "finger" + "tip") )You have? Well, stop worrying. Here's a script that will find them )by seeing which words in /usr/dict/words can be spelled via )combinations of commands in /bin:/usr/bin:/usr/ucb Nice indeed, Steve! I've changed your script in a few ways, though: - /etc and /usr/etc are now searched too, which leads to the next change - it's checked if an entry is really an executable - double entries (from different directories) are removed and most importantly - it's shown HOW each word can be broken up into UNIX commands! Some words have more than 1 `representation' in the `UNIX vector space'. Example: view vi-e-w I don't have much experience with Perl yet, so my version of the script may be improved too. Here's some output: Ac-ta-e-on Cal-cut-ta Sh-ar-on Sh-e-ld-on Wall-ac-e W-ar-sa-w ac-comm-od-at-e as-sum-e as-tr-id-e clear-head-ed col-line-ar e-du-cat-e e-man-at-e enroll-e-e ex-e-cut-e id-e-at-e man-at-e-e on-e-time pr-e-sum-e pr-e-tty refer-e-e sed-at-e su-cc-e-ed test-at-e time-sh-ar-e tr-e-as-on w-ar-head w-ar-time w-at-e-rsh-ed Here's the new script: --------------------cut here-------------------- #!/usr/local/bin/perl # unixword v2.0 # find the words in /usr/dict/words that can be constructed # out of unix commands. sort alphabetically. # show how each word can be constructed from which commands. # /etc and /usr/etc are now searched too. # # v1.0 by steve hayman, dec 12/1990 # v2.0 by maarten litmaath, dec 15/1990 @dirs = ( '/bin', '/usr/bin', '/usr/ucb', '/etc', '/usr/etc'); $wordlist = '/usr/dict/words'; # step 1: get a list of executables in the various directories # step 2: leave out all entries containing non-alphabetic characters # use an associative array to get rid of duplicate entries foreach $dir ( @dirs ) { opendir(DIR, $dir) || die "Can't opendir $dir: $!"; foreach $f (readdir(DIR)) { $ent = $dir . '/' . $f; if ($f !~ /\W|_|\d/ && -x $ent && ! -d $ent) { $files{$f} = 0; } } close(DIR); } @files = keys(%files); # step 3: construct a suitable regular expression matching # all these filenames $re = '^(' . join("|", @files) . ')+$' ; # step 4: match the dictionary file against this pattern; store words that # match the pattern - assoc. array indexed by word, containing word len. open(DICT, $wordlist) || die "Can't open $wordlist: $!"; while ( ) { chop; $len{$_} = length if /$re/io; } # breakup() returns an array of all possible `breakups' of its argument # example for `abcd': # a-b-c-d # a-b-cd # a-bc-d # a-bcd # ab-c-d # ab-cd # abc-d # abcd sub breakup { local($word) = @_; local(@L) = 1 .. length($word) - 1; local(@ans, @sufs, $pre, $prelen, $suf); for $prelen (@L) { $pre = substr($word, 0, $prelen); @sufs = &breakup(substr($word, $prelen)); foreach $suf (@sufs) { push(@ans, $pre . '-' . $suf); } } push(@ans, $word); @ans; } $brkupre = '^(-' . join("|-", @files) . ')+$' ; # step 5: print word list alphabetically, show how each word can be # broken up foreach $word ( sort keys %len ) { @tries = &breakup($word); foreach $try (@tries) { print "$try\n" if "-$try" =~ /$brkupre/io; } } -- In the Bourne shell syntax tabs and spaces are equivalent almost everywhere. The exception: _indented_ here documents. :-( Does anyone remember the famous mistake Makefile-novices often make?