Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!princeton!allegra!mit-eddie!ll-xn!cit-vax!usc-oberon!sdcrdcf!trwrb!desint!geoff From: geoff@desint.UUCP Newsgroups: net.sources Subject: ispell repost (less dict) 01/02: enhanced, fixed Message-ID: <296@desint.UUCP> Date: Sat, 14-Mar-87 05:01:00 EST Article-I.D.: desint.296 Posted: Sat Mar 14 05:01:00 1987 Date-Received: Sun, 15-Mar-87 23:35:34 EST Reply-To: geoff@desint.UUCP (Geoff Kuenning) Followup-To: net.sources.bugs Distribution: world Organization: Interrupt Technology Corp., Manhattan Beach, CA Lines: 1828 : This is a definitive integrated/enhanced ispell (except the dictionary). : Everybody else's work has been installed, and many other bugs have : been fixed. I have also written a spelling-list suffix muncher. : See the first file in the shar (UPDATE) for more details. : : Also, don't forget to pick up my three companion postings of dictionary : diff's in net.sources.bugs. : : Geoff Kuenning : {hplabs,ihnp4}!trwrb!desint!geoff : #! /bin/sh # This is a shell archive, meaning: # 1. Remove everything above the #! /bin/sh line. # 2. Save the resulting text in a file. # 3. Execute the file with /bin/sh (not csh) to create the files: # UPDATE # Makefile # ispell.man # README # WISHES # expand.awk # expand1.sed # expand2.sed # munchlist.sh # ispell.el # buildhash.c # This archive created: Sat Mar 14 00:58:44 1987 export PATH; PATH=/bin:$PATH echo shar: extracting "'UPDATE'" '(5252 characters)' if test -f 'UPDATE' then echo shar: will not over-write existing file "'UPDATE'" else sed 's/^X //' << \SHAR_EOF > 'UPDATE' X Ispell enhancements - 3/13/87 X X (See three companion postings in net.sources.bugs). X X Here are the enhancements to ispell that I mentioned a couple of days ago. X Because of the number of changes, several of the context diff's are bigger X than the original files. In addition, many people have gotten confused X about versions, since enhancements/fixes have been made by six different X people, counting myself (for the list, see the end of ispell.man). I X have integrated all of these fixes and enhancements in one place. X X For these reasons, I have decided to repost all of the sources for ispell, X with one exception -- the dictionary. (A couple of small files, such X as ispell.el, are unchanged, but I decided to repost them any for X completeness. If you didn't have ispell before, you now need only the X dictionary). X X The dictionary is a special case: if you think about it, even ordinary X diff's will always work with "patch" on that each-line-is-unique file. X An out-of-place insertion can be corrected by sorting the dictionary X after patching (something that is done anyway as a side effect of the X new "munchlist" script). Because of this, I have decided not to repost X the sizable dictionary. In the process of testing this code, it occurred X to me to run dict.191 through UNIX "spell"; the results of that are X given in three companion postings in net.sources.bugs, which seemed X like a more appropriate place for the diffs. (The postings are not X divided because of their size; see comments in the postings for my X reasons). X X Now, here's what I've done: X X In ispell itself: X X - The personal dictionary is now hashed, just like the main one, and X supports suffixes just like the main one. (It's not actually X integrated with the main one, because expanding the main one X is inefficient and poses a minor but troublesome technical X problem). A personal dictionary of 28000+ words can be read in X within a few minutes (hey, nobody's perfect -- whatcha doing X with such a big dictionary anyway? :-). X - New option "-c" is used by the new munchlist script to generate X suggested root/suffix combinations. X - The -d option can now specify /dev/null, if you want to use X only your personal dictionary (this also saves startup time X with -c, and is used by the "munchlist" script, which is why X I put it in). X - The -p option is now more flexible about its handling of pathnames. X An absolute pathname is always interpreted literally. A X relative pathname from WORDLIST is looked up in $HOME first, X then in the current directory. The -p option behaves in the X reverse fashion: current directory first, then $HOME. This X behavior seems more intuitive to me; I'd be interested in X opinions of others if you don't find it intuitive. X - Perhaps most important, I have completely overhauled the logic X in good.c, so that it (I think) matches what the README file X says it should, no more, no less. The code has been extensively X tested, notably by interaction with the new expansion scripts; X nevertheless because of the extent of the changes and the X nature of the logic, I'd suggest a bit of suspicion for a while. X A technique we've found useful here is to do your normal work X with ispell, and then do a final check with UNIX spell or some X other slow, inconvenient program to make sure ispell didn't X screw up. X X New scripts: X X - expand.awk: an obsolete (but correct) awk script that does X the same thing as expand[12].sed, except slower. The awk X script is also much easier to understand than the sed scripts. X Superseded by the sed scripts, except for very short input. X - expand[12].sed: the sed pipe X X "sed -f expand1.sed $file | sed -f expand2.sed" X X where "$file" is a raw dictionary file with suffixes X (e.g., dict.191), generates a list of each root alone, plus X the root expanded with each possible suffix (e.g., X "BOTH/R/Z" produces "BOTH", "BOTHER", and "BOTHERS"). The X output should usually be sorted with the -u switch before X further processing. These scripts are used by 'munchlist'; X they are also useful for (a) checking an ispell dictionary X with some other spell-checking program and (b) figuring X out what a particular suffix does to a certain word without X reading the README file. X - munchlist.sh: a slow, but effective, shell script that takes X lists of expanded or unexpanded words as input and reduces X them to a (usually smaller) list of roots and suffixes. The X result is written to standard output. I think the documentation X forgot to mention the input must be one word per line. I X have successfully used this script to combine dict.191 with X /usr/dict/words; it's also useful (and a lot faster) on X private dictionaries. For private dictionaries. it will also X remove any word that has since been added to the main dictionary. X X Oh yes, I almost forgot: the original documentation didn't mention X that ispell is a long-name program. If your "File:" display on the X top line actually contains the misspelled word, you have long-name problems. X My fixes don't address long names, because I finally have a way to X compile long-name programs, thanks to "hash8". X X Geoff Kuenning X geoff@ITcorp.COM X ...!trwrb!desint!geoff SHAR_EOF if test 5252 -ne "`wc -c < 'UPDATE'`" then echo shar: error transmitting "'UPDATE'" '(should have been 5252 characters)' fi fi # end of overwriting check echo shar: extracting "'Makefile'" '(1198 characters)' if test -f 'Makefile' then echo shar: will not over-write existing file "'Makefile'" else sed 's/^X //' << \SHAR_EOF > 'Makefile' X # -*- Mode: Text -*- X X # Look over config.h before building. X # X # LIBDIR, DEFHASH, DEFDICT should match definitions in config.h. X # X # The ifdef NO8BIT may be used if 8 bit extended text characters X # cause problems, or you simply don't wish to allow the feature. X # X # the argument syntax for buildhash to make alternate dictionary files X # is simply: X # X # buildhash X X CFLAGS = -O X BINDIR = /usr/local/bin X LIBDIR = /usr/local/lib X DEFHASH = ispell.hash X DEFDICT = dict.191 X X # TERMLIB = -lcurses X TERMLIB = -ltermlib X all: buildhash ispell $(DEFHASH) X X ispell.hash: buildhash $(DEFDICT) X buildhash X X install: buildhash ispell $(DEFHASH) X cp ispell ${BINDIR}/ispell X cp munchlist.sh $(BINDIR)/munchlist X cp ispell.hash ${LIBDIR}/${DEFHASH} X cp expand1.sed expand2.sed $(LIBDIR) X chmod 755 ${BINDIR}/ispell $(BINDIR)/munchlist X chmod 644 ${LIBDIR}/$(DEFHASH) $(LIBDIR)/expand1.sed \ X $(LIBDIR)/expand2.sed X X buildhash: buildhash.o hash.o X cc -o buildhash buildhash.o hash.o X X ispell: ispell.o term.o good.o lookup.o hash.o tree.o X cc $(CFLAGS) -o ispell ispell.o term.o good.o lookup.o \ X hash.o tree.o $(TERMLIB) X X clean: X rm -f *.o buildhash ispell core a.out mon.out hash.out \ X *.stat *.cnt SHAR_EOF if test 1198 -ne "`wc -c < 'Makefile'`" then echo shar: error transmitting "'Makefile'" '(should have been 1198 characters)' fi fi # end of overwriting check echo shar: extracting "'ispell.man'" '(8455 characters)' if test -f 'ispell.man' then echo shar: will not over-write existing file "'ispell.man'" else sed 's/^X //' << \SHAR_EOF > 'ispell.man' X .\" -*- Mode:Text -*- X .\" X .TH ISPELL local MIT X .SH NAME X ispell \- Correct spelling for a file X .br X munchlist \- Combine suffixes in a spelling list X .SH SYNOPSIS X .B ispell X [ X .B \-x X | X .B \-d X file | X .B \-p X file | X .B \-w X chars ] file ..... X .br X .B ispell X [ X .B \-d X file | X .B \-p X file | X .B \-w X chars ] X .B \-l X .br X .B ispell X [ X .B \-d X file | X .B \-p X file X ] X .B \-a X .br X .B ispell X [ X .B \-d X file | X .B \-p X file | X .B \-w X chars ] X .B \-c X .br X .B munchlist X [ X .B \-d X file | X .B \-e X | X .B \-w X chars ] X [ files ] X .SH DESCRIPTION X .PP X .I Ispell X is fashioned after the X .I spell X program from ITS (called X .I ispell X on Twenex systems.) The most common usage is "ispell filename". In this X case, X .I ispell X will display each word which does not appear in the dictionary, and X allow you to change it. If there are "near misses" in the dictionary X (words which differ by only a single letter, a missing or extra letter, X or a pair of transposed letters), then they are also displayed. If you X think the word is correct as it stands, you can type either "Space" to X accept it this one time, or "I" to accept it and put it in your private X dictionary. If one of the near misses is the word you want, type the X corresponding number. Finally, if none of these choices is right, you X can type "R" and you will be prompted for a replacement word. X If you want to see a list of words that might be close using wildcard X characters, type "L" to lookup a word in the system dictionary. X .PP X When a misspelled word is found, it is printed at the top of the screen. X Any near misses will be printed on the following lines, and finally, two X lines containing the word are printed at the bottom of the screen. If X your terminal can type in reverse video, the word itself is highlighted. X .PP X The X .B \-l X or "list" option to X .I ispell X is used to produce a list of misspelled words from the standard input. X .PP X The X .B \-a X is intended to be used from other programs through a pipe. In this X mode, X .I ispell X expects the standard input to consist of single words. Each word is X read, and a single line is written to the standard output. If the word X was found in the main dictionary, or your personal dictionary, then the X line contains only a '*'. If the word was found through suffix removal, X then the line contains a '+', a space, and the root word. If the word X is not in the dictionary, but there are near misses, then the line X contains an '&', a space, and a list of the near misses separated by X spaces. Also, each near miss is capitalized the same as the input X words. Finally, if the word neither appears in the dictionary, and X there are no near misses, then the line contains only a '#'. This mode X is also suitable for interactive use when you want to figure out the X spelling of a single word. (These characters are the same as the codes X that the real spell program uses.) X .PP X The X .B \-x X option causes X .I ispell X to remove the .bak file that it normally leaves. The .bak file contains X the pre-corrected text. If there are file opening / writing errors, X the .bak file may be left for recovery purposes even with the -x option. X .PP X The X .B \-d X option is used to specify an alternate hashed dictionary file, X other than the default. If the filename does not begin with a "/", X the library directory for the default dictionary file is prefixed. X This is useful to allow dictionaries which prefer alternate british X spellings ("centre", "tyre", etc), or add lists of special-purpose X jargon and acronyms for subclasses of documents. There are some shortcomings X in attempting to provide foreign-language dictionaries, but something X like "-dfrench" could be made to work somewhat. X The X .B \-d X option may specify X .IR /dev/null , X in which case the dictionary is limited to the personal one. X This may be useful for certain private dictionaries. X .PP X The X .B \-p X option is used to specify an alternate personal dictionary file. X If the file name does not begin with "/", $HOME is prefixed. Also, the X shell variable WORDLIST may be set, which renames the personal dictionary X in the same manner. The command line overrides WORDLIST setting. If X neither is present "ispell.words" is used. X .PP X The X .B \-w X option may be used to specify characters other than alphabetics X which may also appear in words. For instance, X .B \-w X "&" will allow "AT&T" X to be picked up. Underscores are useful in many technical documents. X There is an admittedly crude provision in this option for 8-bit international X characters. If "n" appears in the character string, the three characters X following are a DECIMAL code 0 - 255, for the character. There must be X three decimal characters in all cases, so you have to prepend with 0's, X for instance, to include bells and formfeeds in your words (an admittedly X silly thing to do, but aren't most pedagogical examples): X .PP X n007n012 X .PP X Numeric digits other than the three following "n" are simply numeric X characters. Use of "n" does not conflict with anything because actual X alphabetics have no meaning - alphabetics are already accepted. X .I Ispell X will typically be used with input from a file, meaning that preserving X parity for possible 8 bit characters from the input text is OK. If you X specify the -l option, and actually type text from the terminal, this may X create problems if your stty settings preserve parity. X .PP X The X .B \-c X option is primarily intended for use by the X .I munchlist X shell script. X In this mode, a list of words is read from the standard input. X For each word, a list of possible root words and suffixes will be X written to the standard output. X Some of the root words will be illegal and must be filtered from the X output by other means; X the X .I munchlist X script does this. X As an example, the command "echo BOTHER | ispell -c" produces: X .PP X .RS X .nf X BOTH X BOTHE/R X BOTH/R X .fi X .RE X .PP X The X .I munchlist X shell script is used to reduce the size of dictionary files, X primarily personal dictionary files. X It is also capable of combining dictionaries from various sources. X The given X .I files X are read (standard input if no arguments are given), X reduced to a minimal set of roots and suffixes that will match the X same list of words, and written to standard output. X .PP X Normally, words that are in the default dictionary are removed by X .I munchlist X during processing. X If the list is to be used with a different dictionary, the X .B \-d X option can be used to specify an alternate (hashed) dictionary file X containing words to be removed from the output list. X If a dictionary file of X .I /dev/null X is specified, no words will be removed from the output; X this is useful when munching the primary dictionary file. X .PP X The X .B \-w X option is passed on to X .IR ispell . X The X .B \-e X ("efficient") option causes the script to use a slower algorithm that uses X somewhat less space in TMPDIR (normally X .IR /usr/tmp ")." X .PP X It is possible to install X .I ispell X in such a way as to only support ASCII range text if desired. X .SH DEFAULT FILES X /usr/public/lib/ispell.hash X .br X /usr/dict/web2 for the Lookup function X .br X $HOME/ispell.words user's private dictionary X .br X /usr/public/lib/expand[12].sed sed scripts for expanding suffixes X .SH SEE ALSO X spell(1), egrep(1), look(1) X .SH BUGS X It takes about five seconds for X .I ispell X to read in the hash table. X .sp X Perhaps more than ten choices should be allowed for near misses. X .sp X The hash table is stored as a quarter-megabyte array, so a PDP-11 X version does not seem likely. X .sp X .I Ispell X should understand more X .I troff X syntax, and deal more intelligently with contractions. X .sp X While alternate dictionaries for foreign languages could be defined, and X the international characters included in words, rules concerning X word endings / pluralization accommodate english only. X .sp X .I Munchlist X is very slow, and requires tremendous amounts of temporary file space for X large dictionaries. X It does respect the TMPDIR environment variable, so this space can be X redirected. X However, a lot of the temporary space it needs is for sorting, so TMPDIR X is only a partial help on systems with an uncooperative X .IR sort (1). X As a benchmark, the 15000-word X .I dict.191 X takes about 1200 blocks in TMPDIR, and 2000 in X .IR sort "'s" X temporary directories. X On a 68000 workstation, it runs for the better part of an hour. X Munching X .I dict.191 X with X .I /usr/dict/words X (28000 words output) X took another 1500 blocks or so, and ran for about three hours. X .SH AUTHOR X Pace Willisson (pace@mit-vax) X .br X Enhanced by James Woods, Bob McQueer, Bill Randle, Marc Ries, Rob McMahon, X and Geoff Kuenning. SHAR_EOF if test 8455 -ne "`wc -c < 'ispell.man'`" then echo shar: error transmitting "'ispell.man'" '(should have been 8455 characters)' fi fi # end of overwriting check echo shar: extracting "'README'" '(6256 characters)' if test -f 'README' then echo shar: will not over-write existing file "'README'" else sed 's/^X //' << \SHAR_EOF > 'README' X -*- Mode:Text -*- X X Ispell consists of two programs: the actual spelling checker, "ispell", X and the hash table builder, "buildhash". Everything is set up so you X can just say "make install" and have everything happen. You might want X to edit the makefile, and ispell.h to change the destination of the X program and the hash table. X X The dictionary comes from the ITS spell dictionary. I got it from X "ml:wba;dict 191", although I don't know that this is the copy currenty X in use on the 20's around MIT. X X ---------------------------------------------------------------------- X X Addendum: X X My eternal gratitude to the author of ispell -- I don't know how I X ever lived without it. I received his permission to post ispell to X the net and have added a GNU EMACS interface. Look in the file X ispell.el for installation instructions. X X As far as I know, no one informally "supports" this program. If you X would like to "adopt" it (collect fixes/enhancements and post a new X version periodically), feel free to do so. X X I volunteer to collect dictionary diffs and post a composite diff X periodically. If you add a lot of words to the main dictionary, send X me the diffs between the the modified dictionary and the posted one. X Also, if you have access to a TOPS20 system with a more complete X dictionary in ispell format, send me the diffs if you can. Just X PLEASE don't dump an entire dictionary to our site! X X The dictionary posted is one I snarfed from around here -- after X comparison with the one originally supplied, ours appears a tad more X complete and accurate. X X Walt Buehring X Texas Instruments - Computer Science Center X X ARPA: Buehring%TI-CSL@CSNet-Relay X UUCP: {smu, texsun, im4u, rice} ! ti-csl ! buehring X X ---------------------------------------------------------------------- X X The following is the only documentation I could find about the format X of the dictionary. It was written for the TOPS20 speller that ispell X mimics, so I believe most the information is applicable. It should be X useful if you want to add words to the main dictionary by hand. -WB X X ---------------------------------------------------------------------- X X 11.6 Dictionary flags X X Words in SPELL's main dictionary (but not the other dictionaries) may X have flags associated with them to indicate the legality of suffixes X without the need to keep the full suffixed words in the dictionary. The X flags have "names" consisting of single letters. Their meaning is as X follows: X X Let # and @ be "variables" that can stand for any letter. Upper case X letters are constants. "..." stands for any string of zero or more X letters, but note that no word may exist in the dictionary which is not at X least 2 letters long, so, for example, FLY may not be produced by placing X the "Y" flag on "F". Also, no flag is effective unless the word that it X creates is at least 4 letters long, so, for example, WED may not be X produced by placing the "D" flag on "WE". X X "V" flag: X ...E --> ...IVE as in CREATE --> CREATIVE X if # .ne. E, ...# --> ...#IVE as in PREVENT --> PREVENTIVE X X "N" flag: X ...E --> ...ION as in CREATE --> CREATION X ...Y --> ...ICATION as in MULTIPLY --> MULTIPLICATION X if # .ne. E or Y, ...# --> ...#EN as in FALL --> FALLEN X X "X" flag: X ...E --> ...IONS as in CREATE --> CREATIONS X ...Y --> ...ICATIONS as in MULTIPLY --> MULTIPLICATIONS X if # .ne. E or Y, ...# --> ...#ENS as in WEAK --> WEAKENS X X "H" flag: X ...Y --> ...IETH as in TWENTY --> TWENTIETH X if # .ne. Y, ...# --> ...#TH as in HUNDRED --> HUNDREDTH X X "Y" FLAG: X ... --> ...LY as in QUICK --> QUICKLY X X "G" FLAG: X ...E --> ...ING as in FILE --> FILING X if # .ne. E, ...# --> ...#ING as in CROSS --> CROSSING X X "J" FLAG" X ...E --> ...INGS as in FILE --> FILINGS X if # .ne. E, ...# --> ...#INGS as in CROSS --> CROSSINGS X X "D" FLAG: X ...E --> ...ED as in CREATE --> CREATED X if @ .ne. A, E, I, O, or U, X ...@Y --> ...@IED as in IMPLY --> IMPLIED X if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U) X ...@# --> ...@#ED as in CROSS --> CROSSED X or CONVEY --> CONVEYED X "T" FLAG: X ...E --> ...EST as in LATE --> LATEST X if @ .ne. A, E, I, O, or U, X ...@Y --> ...@IEST as in DIRTY --> DIRTIEST X if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U) X ...@# --> ...@#EST as in SMALL --> SMALLEST X or GRAY --> GRAYEST X X "R" FLAG: X ...E --> ...ER as in SKATE --> SKATER X if @ .ne. A, E, I, O, or U, X ...@Y --> ...@IER as in MULTIPLY --> MULTIPLIER X if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U) X ...@# --> ...@#ER as in BUILD --> BUILDER X or CONVEY --> CONVEYER X X "Z FLAG: X ...E --> ...ERS as in SKATE --> SKATERS X if @ .ne. A, E, I, O, or U, X ...@Y --> ...@IERS as in MULTIPLY --> MULTIPLIERS X if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U) X ...@# --> ...@#ERS as in BUILD --> BUILDERS X or SLAY --> SLAYERS X X "S" FLAG: X if @ .ne. A, E, I, O, or U, X ...@Y --> ...@IES as in IMPLY --> IMPLIES X if # .eq. S, X, Z, or H, X ...# --> ...#ES as in FIX --> FIXES X if # .ne. S, X, Z, H, or Y, or (# = Y and @ = A, E, I, O, or U) X ...@# --> ...@#S as in BAT --> BATS X or CONVEY --> CONVEYS X X "P" FLAG: X if @ .ne. A, E, I, O, or U, X ...@Y --> ...@INESS as in CLOUDY --> CLOUDINESS X if # .ne. Y, or @ = A, E, I, O, or U, X ...@# --> ...@#NESS as in LATE --> LATENESS X or GRAY --> GRAYNESS X X "M" FLAG: X ... --> ...'S as in DOG --> DOG'S X X ---------------------------------------------------------------------- X X [Whew! That's all very nice, but how about a quick reference... -WB] X X V - ive X N - ion, tion, en X X - ions, ications, ens X H - th, ieth X Y - ly X G - ing X J - ings X D - ed X T - est X R - er X Z - ers X S - s, es, ies X P - ness, iness X M - 's SHAR_EOF if test 6256 -ne "`wc -c < 'README'`" then echo shar: error transmitting "'README'" '(should have been 6256 characters)' fi fi # end of overwriting check echo shar: extracting "'WISHES'" '(1211 characters)' if test -f 'WISHES' then echo shar: will not over-write existing file "'WISHES'" else sed 's/^X //' << \SHAR_EOF > 'WISHES' X Things remaining to be done to ispell: X X - The single biggest remaining deficiency (in my opinion) is the X extensive misuse of 'strlen'. Strlen is often called repeatedly X on the same string within a few lines of code. Worse, many X routines accept a "length" parameter (which is usually passed X by running 'strlen' within the arglist) but ignore it and X actually require the string to be null-terminated. Somebody X should do a systematic edit and clean this up. I wouldn't X be surprised to learn that ispell spends 50% of its time in X strlen. X - The "munchlist" script can actually increase the size of a X dictionary. For example, munching dict.191 (after my bugfixes X to it) reduced the number of words by about 40, but increased X the number of characters by a small percentage. This is X because munchlist doesn't recognize duplicate suffixes that X generate the same result, except for the three special X "s-ending" suffixes J, Z, and X. For example, right now X munchlist will make BATHER by adding the R suffix to both X BATH and BATHE. It would be nice if munchlist could recognize X the redundancy and reduce its output so that each word was made X in only one way. SHAR_EOF if test 1211 -ne "`wc -c < 'WISHES'`" then echo shar: error transmitting "'WISHES'" '(should have been 1211 characters)' fi fi # end of overwriting check echo shar: extracting "'expand.awk'" '(5769 characters)' if test -f 'expand.awk' then echo shar: will not over-write existing file "'expand.awk'" else sed 's/^X //' << \SHAR_EOF > 'expand.awk' X BEGIN {FS = "/"} X { X print $1 X #Let # and @ be "variables" that can stand for any letter. Upper case X #letters are constants. "..." stands for any string of zero or more X #letters, but note that no word may exist in the dictionary which is not at X #least 2 letters long, so, for example, FLY may not be produced by placing X #the "Y" flag on "F". Also, no flag is effective unless the word that it X #creates is at least 4 letters long, so, for example, WED may not be X #produced by placing the "D" flag on "WE". X size = length ($1) X # X # Break out the last two letters into "tail", and put X # corresponding versions of the root with the tail trimmed X # off into "trimmed". If they are vowels, set vowel[i]. X # (Actually, only vowel[2] is used). X # X for (i = 1; i < 3; i++) X { X tail[i] = substr ($1, size - i + 1, 1) X if (tail[i] == "A" || tail[i] == "E" || tail[i] == "I" \ X || tail[i] == "O" || tail[i] == "U") X vowel[i] = 1 X else X vowel[i] = 0 X trimmed[i] = substr ($1, 1, size - i) X } X for (i = 2; i <= NF; i++) X { X if ($i == "V") X { X # ...E --> ...IVE as in CREATE --> CREATIVE X # if # .ne. E, ...# --> ...#IVE as in PREVENT --> PREVENTIVE X if (tail[1] == "E") X print trimmed[1] "IVE" X else X print $1 "IVE" X } X else if ($i == "N" || $i == "X") X { X # ...E --> ...ION as in CREATE --> CREATION X # ...Y --> ...ICATION as in MULTIPLY --> MULTIPLICATION X # if # .ne. E or Y, ...# --> ...#EN as in FALL --> FALLEN X # "X" flag: X # ...E --> ...IONS as in CREATE --> CREATIONS X # ...Y --> ...ICATIONS as in MULTIPLY --> MULTIPLICATIONS X # if # .ne. E or Y, ...# --> ...#ENS as in WEAK --> WEAKENS X if ($i == "N") X plural = "" X else X plural = "S" X if (tail[1] == "E") X print trimmed[1] "ION" plural X else if (tail[1] == "Y") X print trimmed[1] "ICATION" plural X else X print $1 "EN" plural X } X else if ($i == "H") X { X # ...Y --> ...IETH as in TWENTY --> TWENTIETH X # if # .ne. Y, ...# --> ...#TH as in HUNDRED --> HUNDREDTH X if (tail[1] == "Y") X print trimmed[1] "IETH" X else X print $1 "TH" X } X else if ($i == "Y") X { X # ... --> ...LY as in QUICK --> QUICKLY X print $1 "LY" X } X else if ($i == "G" || $i == "G") X { X # ...E --> ...ING as in FILE --> FILING X # if # .ne. E, ...# --> ...#ING as in CROSS --> CROSSING X # "J" flag: X # ...E --> ...INGS as in FILE --> FILINGS X # if # .ne. E, ...# --> ...#INGS as in CROSS --> CROSSINGS X if ($i == "G") X plural = "" X else X plural = "S" X if (tail[1] == "E") X print trimmed[1] "ING" plural X else X print $1 "ING" plural X } X else if ($i == "D") X { X # ...E --> ...ED as in CREATE --> CREATED X # if @ .ne. A, E, I, O, or U, X # ...@Y --> ...@IED as in IMPLY --> IMPLIED X # if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U) X # ...@# --> ...@#ED as in CROSS --> CROSSED X # or CONVEY --> CONVEYED X if (tail[1] == "E") X print $1 "D" X else if (tail[1] == "Y" && !vowel[2]) X print trimmed[1] "IED" X else X print $1 "ED" X } X else if ($i == "T") X { X # ...E --> ...EST as in LATE --> LATEST X # if @ .ne. A, E, I, O, or U, X # ...@Y --> ...@IEST as in DIRTY --> DIRTIEST X # if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U) X # ...@# --> ...@#EST as in SMALL --> SMALLEST X # or GRAY --> GRAYEST X if (tail[1] == "E") X print $1 "ST" X else if (tail[1] == "Y" && !vowel[2]) X print trimmed[1] "IEST" X else X print $1 "EST" X } X else if ($i == "R" || $i == "Z") X { X # ...E --> ...ER as in SKATE --> SKATER X # if @ .ne. A, E, I, O, or U, X # ...@Y --> ...@IER as in MULTIPLY --> MULTIPLIER X # if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U) X # ...@# --> ...@#ER as in BUILD --> BUILDER X # or CONVEY --> CONVEYER X # "Z" flag: X # ...E --> ...ERS as in SKATE --> SKATERS X # if @ .ne. A, E, I, O, or U, X # ...@Y --> ...@IERS as in MULTIPLY --> MULTIPLIERS X # if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U) X # ...@# --> ...@#ERS as in BUILD --> BUILDERS X # or SLAY --> SLAYERS X if ($i == "R") X plural = "" X else X plural = "S" X if (tail[1] == "E") X print $1 "R" plural X else if (tail[1] == "Y" && !vowel[2]) X print trimmed[1] "IER" plural X else X print $1 "ER" plural X } X else if ($i == "S") X { X # if @ .ne. A, E, I, O, or U, X # ...@Y --> ...@IES as in IMPLY --> IMPLIES X # if # .eq. S, X, Z, or H, X # ...# --> ...#ES as in FIX --> FIXES X # if # .ne. S, X, Z, H, or Y, or (# = Y and @ = A, E, I, O, or U) X # ...@# --> ...@#S as in BAT --> BATS X # or CONVEY --> CONVEYS X if (tail[1] == "Y" && !vowel[2]) X print trimmed[1] "IES" X else if (tail[1] == "S") X print $1 "ES" X else X print $1 "S" X } X else if ($i == "P") X { X # if @ .ne. A, E, I, O, or U, X # ...@Y --> ...@INESS as in CLOUDY --> CLOUDINESS X # if # .ne. Y, or @ = A, E, I, O, or U, X # ...@# --> ...@#NESS as in LATE --> LATENESS X # or GRAY --> GRAYNESS X if (tail[1] == "Y" && !vowel[2]) X print trimmed[1] "INESS" X else X print $1 "NESS" X } X else if ($i == "M") X { X # ... --> ...'S as in DOG --> DOG'S X print $1 "'S" X } X } X } SHAR_EOF if test 5769 -ne "`wc -c < 'expand.awk'`" then echo shar: error transmitting "'expand.awk'" '(should have been 5769 characters)' fi fi # end of overwriting check echo shar: extracting "'expand1.sed'" '(1607 characters)' if test -f 'expand1.sed' then echo shar: will not over-write existing file "'expand1.sed'" else sed 's/^X //' << \SHAR_EOF > 'expand1.sed' X /^[^/]*$/n X /\/V/ { X /^[^/]*E\// { X s@\([^/]*\)E\([/A-Z]*\)/V@\1IVE\ X \1E\2@; P; D X } X s@\([^/]*\)\([/A-Z]*\)/V@\1IVE\ X \1\2@; P; D X } X /\/N/ { X /^[^/]*E\// { X s@\([^/]*\)E\([/A-Z]*\)/N@\1ION\ X \1E\2@; P; D X } X /^[^/]*Y\// { X s@\([^/]*\)Y\([/A-Z]*\)/N@\1ICATION\ X \1Y\2@; P; D X } X s@\([^/]*\)\([/A-Z]*\)/N@\1EN\ X \1\2@; P; D X } X /\/X/ { X /^[^/]*E\// { X s@\([^/]*\)E\([/A-Z]*\)/X@\1IONS\ X \1E\2@; P; D X } X /^[^/]*Y\// { X s@\([^/]*\)Y\([/A-Z]*\)/X@\1ICATIONS\ X \1Y\2@; P; D X } X s@\([^/]*\)\([/A-Z]*\)/X@\1ENS\ X \1\2@; P; D X } X /\/H/ { X /^[^/]*Y\// { X s@\([^/]*\)Y\([/A-Z]*\)/H@\1IETH\ X \1Y\2@; P; D X } X s@\([^/]*\)\([/A-Z]*\)/H@\1TH\ X \1\2@; P; D X } X /\/Y/ { X s@\([^/]*\)\([/A-Z]*\)/Y@\1LY\ X \1\2@; P; D X } X /\/G/ { X /^[^/]*E\// { X s@\([^/]*\)E\([/A-Z]*\)/G@\1ING\ X \1E\2@; P; D X } X s@\([^/]*\)\([/A-Z]*\)/G@\1ING\ X \1\2@; P; D X } X /\/J/ { X /^[^/]*E\// { X s@\([^/]*\)E\([/A-Z]*\)/J@\1INGS\ X \1E\2@; P; D X } X s@\([^/]*\)\([/A-Z]*\)/J@\1INGS\ X \1\2@; P; D X } X /\/D/ { X /^[^/]*E\// { X s@\([^/]*\)\([/A-Z]*\)/D@\1D\ X \1\2@; P; D X } X /^[^/]*[^AEIOU]Y\// { X s@\([^/]*\)Y\([/A-Z]*\)/D@\1IED\ X \1Y\2@; P; D X } X s@\([^/]*\)\([/A-Z]*\)/D@\1ED\ X \1\2@; P; D X } X /\/T/ { X /^[^/]*E\// { X s@\([^/]*\)\([/A-Z]*\)/T@\1ST\ X \1\2@; P; D X } X /^[^/]*[^AEIOU]Y\// { X s@\([^/]*\)Y\([/A-Z]*\)/T@\1IEST\ X \1Y\2@; P; D X } X s@\([^/]*\)\([/A-Z]*\)/T@\1EST\ X \1\2@; P; D X } X /\/R/ { X /^[^/]*E\// { X s@\([^/]*\)\([/A-Z]*\)/R@\1R\ X \1\2@; P; D X } X /^[^/]*[^AEIOU]Y\// { X s@\([^/]*\)Y\([/A-Z]*\)/R@\1IER\ X \1Y\2@; P; D X } X s@\([^/]*\)\([/A-Z]*\)/R@\1ER\ X \1\2@; P; D X } SHAR_EOF if test 1607 -ne "`wc -c < 'expand1.sed'`" then echo shar: error transmitting "'expand1.sed'" '(should have been 1607 characters)' fi fi # end of overwriting check echo shar: extracting "'expand2.sed'" '(622 characters)' if test -f 'expand2.sed' then echo shar: will not over-write existing file "'expand2.sed'" else sed 's/^X //' << \SHAR_EOF > 'expand2.sed' X /^[^/]*$/n X /\/Z/ { X /^[^/]*E\// { X s@\([^/]*\)\([/A-Z]*\)/Z@\1RS\ X \1\2@; P; D X } X /^[^/]*[^AEIOU]Y\// { X s@\([^/]*\)Y\([/A-Z]*\)/Z@\1IERS\ X \1Y\2@; P; D X } X s@\([^/]*\)\([/A-Z]*\)/Z@\1ERS\ X \1\2@; P; D X } X /\/S/ { X /^[^/]*[^AEIOU]Y\// { X s@\([^/]*\)Y\([/A-Z]*\)/S@\1IES\ X \1Y\2@; P; D X } X /^[^/]*[SXZH]\// { X s@\([^/]*\)\([/A-Z]*\)/S@\1ES\ X \1\2@; P; D X } X s@\([^/]*\)\([/A-Z]*\)/S@\1S\ X \1\2@; P; D X } X /\/P/ { X /^[^/]*[^AEIOU]Y\// { X s@\([^/]*\)Y\([/A-Z]*\)/P@\1INESS\ X \1Y\2@; P; D X } X s@\([^/]*\)\([/A-Z]*\)/P@\1NESS\ X \1\2@; P; D X } X /\/M/ { X s@\([^/]*\)\([/A-Z]*\)/M@\1'S\ X \1\2@; P; D X } SHAR_EOF if test 622 -ne "`wc -c < 'expand2.sed'`" then echo shar: error transmitting "'expand2.sed'" '(should have been 622 characters)' fi fi # end of overwriting check echo shar: extracting "'munchlist.sh'" '(6218 characters)' if test -f 'munchlist.sh' then echo shar: will not over-write existing file "'munchlist.sh'" else sed 's/^X //' << \SHAR_EOF > 'munchlist.sh' X : Use /bin/sh X # X # Given a list of words for ispell, generate a reduced list X # in which all possible suffixes have been collapsed. The reduced X # list will match the same list as the original. X # X # Usage: X # X # munchlist [ -d hashfile ] [ -e ] [ -w chars ] [ file ] ... X # X # Options: X # X # -d hashfile X # Remove any words that are covered by 'hashfile'. The X # default is the default ispell dictionary. The words X # will be removed only if all suffixes are covered by X # the hash file. A hashfile of /dev/null should be X # specified when the main dictionary is being munched. X # -e Economical algorithm. This will use much less temporary X # disk space, at the expense of time. Useful with large files X # (such as complete dictionaries). X # -w Passed on to ispell (specify chars that are part of a word) X # X # The given input files are merged, then processed by 'ispell -c' X # to generate possible suffix lists; these are then combined X # and reduced. The final result is written to standard output. X # X # For portability to older systems, I have avoided getopt. X # X # Geoff Kuenning X # 2/28/87 X # X LIBDIR=/tmp2/lib # Must match config.h X DEFDICT=dict.191 # Must match config.h X EXPAND1=${LIBDIR}/expand1.sed X EXPAND2=${LIBDIR}/expand2.sed X TDIR=${TMPDIR:-/usr/tmp} X TMP=${TDIR}/munch$$ X X cheap=no X dictopt= X wchars= X while [ $# != 0 ] X do X case "$1" in X -d) X case "$2" in X /dev/null) X dictopt=NONE X ;; X *) X dictopt="-d $2" X ;; X esac X shift X ;; X -e) X cheap=yes X ;; X -w) X wchars="-w $2" X shift X ;; X *) X break X esac X shift X done X # X # Awk program to combine suffixes onto one line X # X AWKMUNCH=' X { X if ($1 != old1 && old1 != "") X { X print old1 suffixes X suffixes = "" X } X old1 = $1 X for (i = 2; i <= NF; i++) X suffixes = suffixes "/" $i X } X END { if (old1 != "") print old1 suffixes }' X # X # Awk program to break suffixes up into one per line X # X AWKUNMUNCH=' X { X print $1 X for (i = 2; i <= NF; i++) X print $1 "/" $i X }' X trap "/bin/rm -f ${TMP}*; exit 1" 1 2 15 X # X # Collect all the input (cat), convert to uppercase (tr), expand all X # the suffix options (two sed's), and preserve (sorted) for later X # joining. Unless an explicitly null dictionary was specified, remove X # all expanded words that are covered by the dictionary (ispell). X # X if [ "X$dictopt" = "XNONE" ] X then X cat "$@" | tr '[a-z]' '[A-Z]' \ X | sed -f $EXPAND1 | sed -f $EXPAND2 | sort -u > ${TMP}a X else X cat "$@" | tr '[a-z]' '[A-Z]' \ X | sed -f $EXPAND1 | sed -f $EXPAND2 | sort -u \ X | ispell -l $dictopt -p /dev/null > ${TMP}a X fi X # X # Munch the input to generate roots and suffixes (ispell -c). We are X # only interested in words that have at least one suffix (egrep /); the X # next step will pick up the rest. Some of the roots are illegal. We X # use join to restrict the output to those root words that are found X # in the original dictionary. X # X # Note: one disadvantage of this pipeline is that for a large file, X # the join and awk may be sitting around for a long time while ispell X # and sort run. You can get rid of this by splitting the pipe, at X # the expense of more temp file space. X # X if [ $cheap = yes ] X then X ispell $wchars -c -d /dev/null -p /dev/null < ${TMP}a \ X | egrep / | sort -u -t/ +0 -1 +1 \ X | join -t/ - ${TMP}a | awk -F/ "$AWKMUNCH" > ${TMP}b X else X ispell $wchars -c -d /dev/null -p /dev/null < ${TMP}a \ X | egrep / | sort -u -t/ +0 -1 +1 \ X | join -t/ - ${TMP}a > ${TMP}b X fi X # X # There is now one slight problem: the suffix flags X, J, and Z X # are simply the addition of an "S" to the suffixes N, G, and R, X # respectively. This produces redundant entries in the output file; X # for example, ABBREVIATE/N/X and ABBREVIATION/S. We must get rid X # of the unnecessary duplicates. The candidates are those words that X # have only an "S" flag (egrep). We strip off the "S" (sed), and X # generate a list of roots that might have made these words (ispell -c). X # Of these roots, we select those that have the N, G, or R flags, X # replacing each with the plural equivalent X, J, or Z (sed -n). X # Using join once again, we select those that have legal roots X # and put them in ${TMP}c. X # X if [ $cheap = yes ] X then X egrep '^[^/]*/S$' ${TMP}b | sed 's@/S$@@' \ X | ispell -c -d /dev/null -p /dev/null \ X | sed -n -e '/\/N/s/N$/X/p' -e '/\/G/s/G$/J/p' -e '/\/R/s/R$/Z/p' \ X | sort -u -t/ +0 -1 +1 \ X | join -t/ - ${TMP}a \ X | awk -F/ "$AWKMUNCH" > ${TMP}c X else X egrep '^[^/]*/S$' ${TMP}b | sed 's@/S$@@' \ X | ispell -c -d /dev/null -p /dev/null \ X | sed -n -e '/\/N/s/N$/X/p' -e '/\/G/s/G$/J/p' -e '/\/R/s/R$/Z/p' \ X | sort -u -t/ +0 -1 +1 \ X | join -t/ - ${TMP}a > ${TMP}c X fi X # X # Now we have to eliminate the stuff covered by ${TMP}c from ${TMP}. X # First, we re-expand the suffixes we just made (sed -f pair), and let X # ispell re-create the /S version (ispell -c). We select the /S versions X # only (egrep), sort them (sort) for comm, and use comm to delete these X # from ${TMP}b. The output of comm (i.e., the trimmed version of X # ${TMP}b) is combined with our special-suffixes file ${TMP}c (sort, X # with preceding awk, if $cheap) and reduced in size (AWKMUNCH) to X # produce a final list of all words that have at least one suffix. X # X if [ $cheap = yes ] X then X sed -f $EXPAND1 < ${TMP}c | sed -f $EXPAND2 \ X | ispell -c -d /dev/null -p /dev/null \ X | egrep '\/S$' | sort -u -t/ +0 -1 +1 | comm -13 - ${TMP}b \ X | awk -F/ "$AWKUNMUNCH" - ${TMP}c \ X | sort -u -t/ +0 -1 +1 - \ X | awk -F/ "$AWKMUNCH" > ${TMP}d X else X sed -f $EXPAND1 < ${TMP}c | sed -f $EXPAND2 \ X | ispell -c -d /dev/null -p /dev/null \ X | egrep '\/S$' | sort -u -t/ +0 -1 +1 | comm -13 - ${TMP}b \ X | sort -u -t/ +0 -1 +1 - ${TMP}c \ X | awk -F/ "$AWKMUNCH" > ${TMP}d X fi X /bin/rm -f ${TMP}[bc] X # X # Now a slick trick. Use ispell to select those (root) words from the original X # list (${TMP}a) that are not covered by the suffix list (${TMP}d). Then we X # merge these with the suffix list and sort it to produce the final output. X # X ispell $wchars -d /dev/null -p ${TMP}d -l < ${TMP}a | tr -d \\015 \ X | sort -u -t/ +0 -1 +1 - ${TMP}d X /bin/rm -f ${TMP}* SHAR_EOF if test 6218 -ne "`wc -c < 'munchlist.sh'`" then echo shar: error transmitting "'munchlist.sh'" '(should have been 6218 characters)' fi chmod +x 'munchlist.sh' fi # end of overwriting check echo shar: extracting "'ispell.el'" '(6763 characters)' if test -f 'ispell.el' then echo shar: will not over-write existing file "'ispell.el'" else sed 's/^X //' << \SHAR_EOF > 'ispell.el' X ;;; Spelling correction interface for GNU EMACS using "ispell" X X ;;; Walt Buehring X ;;; Texas Instruments - Computer Science Center X ;;; ARPA: Buehring%TI-CSL@CSNet-Relay X ;;; UUCP: {smu, texsun, im4u, rice} ! ti-csl ! buehring X X ;;; Depends on the ispell program snarfed from MIT-PREP in early X ;;; 1986. The only interactive command is "ispell-word" which should be X ;;; bound to M-$. If someone writes an "ispell-region" command, X ;;; I would appreciate a copy. X X ;;; To fully install this, add this file to your GNU lisp directory and X ;;; compile it with M-X byte-compile-file. Then add the following to the X ;;; appropriate init file: X X ;;; (autoload 'ispell-word "ispell" X ;;; "Check the spelling of word in buffer." t) X ;;; (global-set-key "\e$" 'ispell-word) X X ;;; If run on a heavily loaded system, the timeout value in ispell-check X ;;; and the initial sleep time in ispell-init-process may need to be increased. X X ;;; No warranty expressed or implied. All sales final. Void where prohibited. X ;;; If you don't like it, change it. X X (defvar ispell-syntax-table nil) X X (if (null ispell-syntax-table) X ;; The following assumes that the standard-syntax-table X ;; is static. If you add words with funky characters X ;; to your dictionary, the following may have to change. X (progn X (setq ispell-syntax-table (make-syntax-table)) X ;; Make certain characters word constituents X (modify-syntax-entry ?' "w " ispell-syntax-table) X (modify-syntax-entry ?- "w " ispell-syntax-table) X ;; Get rid on existing word syntax on certain characters X (modify-syntax-entry ?$ ". " ispell-syntax-table) X (modify-syntax-entry ?% ". " ispell-syntax-table))) X X X (defun ispell-word (&optional quietly) X "Check spelling of word at or before dot. X If word not found in dictionary, display possible corrections in a window X and let user select." X (interactive) X (let* ((current-syntax (syntax-table)) X start end word poss replace) X (unwind-protect X (save-excursion X ;; Ensure syntax table is reasonable X (set-syntax-table ispell-syntax-table) X ;; Move backward for word if not already on one. X (if (not (looking-at "\\w")) X (re-search-backward "\\w" (dot-min) 'stay)) X ;; Move to start of word X (re-search-backward "\\W" (dot-min) 'stay) X ;; Find start and end of word X (or (re-search-forward "\\w+" nil t) X (error "No word to check.")) X (setq start (match-beginning 0) X end (match-end 0) X word (buffer-substring start end))) X (set-syntax-table current-syntax)) X (or quietly (message "Checking spelling of %s..." (upcase word))) X (setq poss (ispell-check word)) X (cond ((eq poss t) X (or quietly (message "Found %s" (upcase word)))) X ((stringp poss) X (or quietly (message "Found it because of %s" (upcase poss)))) X ((null poss) X (or quietly (message "Could Not Find %s" (upcase word)))) X (t (setq replace (ispell-choose poss)) X (if replace X (progn X (goto-char end) X (delete-region start end) X (insert-string replace))))) X poss)) X X X (defun ispell-choose (choices) X "Display possible corrections from list CHOICES. Return chosen word or nil X if none chosen." X (unwind-protect X (save-window-excursion X (let ((count 0) X (words choices) X (pick -1) X (window-min-height 2)) X (overlay-window 3) X (switch-to-buffer "*Choices*") (erase-buffer) X (setq mode-line-format "-- %b --") X (while words X (if (> (+ 7 (current-column) (length (car words))) (window-width)) X (insert "\n")) X (insert "(" (+ count ?a) ") " (car words) " ") X (setq words (cdr words) X count (1+ count))) X (select-window (next-window)) X (while (eq pick -1) X (message "Enter letter to replace word; Space to flush") X (let* ((char (read-char)) X (num (1+ (- (upcase char) ?A)))) X (cond ((= char ? ) (setq pick 0)) X ((or (<= num 0) (> num count)) (ding)) X (t (setq pick num))))) X (and (> pick 0) (nth (1- pick) choices)))) X ;; Protected forms... X (bury-buffer "*Choices*"))) X X X (defun overlay-window (height) X "Create a (usually small) window with HEIGHT lines and avoid X recentering." X (save-excursion X (let ((oldot (save-excursion (beginning-of-line) (dot))) X (top (save-excursion (move-to-window-line height) (dot))) X newin) X (if (< oldot top) (setq top oldot)) X (setq newin (split-window-vertically height)) X (set-window-start newin top)))) X X X (defvar ispell-process nil X "Holds the process object for 'ispell'") X X ;;; create signal used by ispell-filter and ispell-check X (put 'ispell-output 'error-conditions '(ispell-output)) X X (defun ispell-check (word) X "Check spelling of string WORD, return either t for an exact match, a string X containing the root word for a match via suffix removal, a list of possible X correct spellings, or nil for a complete miss." X (ispell-init-process) X (send-string ispell-process (concat word "\n")) X (condition-case output X (progn X (sleep-for 20) X (error "Timeout waiting for ispell process output")) X (ispell-output (ispell-parse-output (car (cdr output)))))) X X (defun ispell-parse-output (output) X "Parse the OUTPUT string of 'ispell' and return a value as specified by the X 'ispell-check' function." X (cond X ((string= output "*") t) X ((string= output "#") nil) X ((string= (substring output 0 1) "+") X (substring output 2)) X (t X (let ((choice-list '())) X (while (not (string= output "")) X (let* ((start (string-match "[A-z]" output)) X (end (string-match " \\|$" output start))) X (if start X (setq choice-list (cons (substring output start end) X choice-list))) X (setq output (substring output (1+ end))))) X choice-list)))) X X X (defvar ispell-process-output "" X "Holds partial output from the 'ispell' process") X X (defun ispell-filter (process output) X "The filter-function for 'ispell'. Signals complete line using the X ispell-output signal" X (if (string= "\n" (substring output (1- (length output)))) X (progn X (setq output (concat ispell-process-output X (substring output 0 (1- (length output)))) X ispell-process-output "") X (signal 'ispell-output (list output))) X (setq ispell-process-output (concat ispell-process-output output)))) X X (defun ispell-init-process () X "Check status of 'ispell' process and start if necessary; set up X filter function for output." X (if (or (not ispell-process) X (not (eq (process-status ispell-process) 'run))) X (progn X (message "Starting new ispell process...") X (and (get-buffer "*ispell*") (kill-buffer "*ispell*")) X (setq ispell-process (start-process "ispell" "*ispell*" X "ispell" "-a")) X (set-process-filter ispell-process 'ispell-filter) X (process-kill-without-query ispell-process) X (sit-for 3)))) X SHAR_EOF if test 6763 -ne "`wc -c < 'ispell.el'`" then echo shar: error transmitting "'ispell.el'" '(should have been 6763 characters)' fi fi # end of overwriting check echo shar: extracting "'buildhash.c'" '(6459 characters)' if test -f 'buildhash.c' then echo shar: will not over-write existing file "'buildhash.c'" else sed 's/^X //' << \SHAR_EOF > 'buildhash.c' X /* -*- Mode: Text -*- */ X /* X * buildhash.c - make a hash table for ispell X * X * Pace Willisson, 1983 X */ X X #include X #include X #include X #include X #include "ispell.h" X #include "config.h" X X #define NSTAT 100 X struct stat dstat, cstat; X X int numwords, hashsize; X X char *malloc(); X X struct dent *hashtbl; X X char *Dfile; X char *Hfile; X X char Cfile[MAXPATHLEN]; X char Sfile[MAXPATHLEN]; X X main (argc,argv) X int argc; X char **argv; X { X FILE *countf; X FILE *statf; X int stats[NSTAT]; X int i; X X if (argc > 1) { X ++argv; X Dfile = *argv; X if (argc > 2) { X ++argv; X Hfile = *argv; X } X else X Hfile = DEFHASH; X } X else { X Dfile = DEFDICT; X Hfile = DEFHASH; X } X X sprintf(Cfile,"%s.cnt",Dfile); X sprintf(Sfile,"%s.stat",Dfile); X X if (stat (Dfile, &dstat) < 0) { X fprintf (stderr, "No dictionary (%s)\n", Dfile); X exit (1); X } X X if (stat (Cfile, &cstat) < 0 || dstat.st_mtime > cstat.st_mtime) X newcount (); X X if ((countf = fopen (Cfile, "r")) == NULL) { X fprintf (stderr, "No count file\n"); X exit (1); X } X numwords = 0; X fscanf (countf, "%d", &numwords); X fclose (countf); X if (numwords == 0) { X fprintf (stderr, "Bad count file\n"); X exit (1); X } X hashsize = numwords; X readdict (); X X if ((statf = fopen (Sfile, "w")) == NULL) { X fprintf (stderr, "Can't create %s\n", Sfile); X exit (1); X } X X for (i = 0; i < NSTAT; i++) X stats[i] = 0; X for (i = 0; i < hashsize; i++) { X struct dent *dp; X int j; X if (hashtbl[i].used == 0) { X stats[0]++; X } else { X for (j = 1, dp = &hashtbl[i]; dp->next != NULL; j++, dp = dp->next) X ; X if (j >= NSTAT) X j = NSTAT - 1; X stats[j]++; X } X } X for (i = 0; i < NSTAT; i++) X fprintf (statf, "%d: %d\n", i, stats[i]); X fclose (statf); X X filltable (); X X output (); X exit(0); X } X X output () X { X FILE *outfile; X struct hashheader hashheader; X int strptr, n, i; X X if ((outfile = fopen (Hfile, "w")) == NULL) { X fprintf (stderr, "can't create %s\n",Hfile); X return; X } X hashheader.magic = MAGIC; X hashheader.stringsize = 0; X hashheader.tblsize = hashsize; X fwrite (&hashheader, sizeof hashheader, 1, outfile); X strptr = 0; X for (i = 0; i < hashsize; i++) { X n = strlen (hashtbl[i].word) + 1; X fwrite (hashtbl[i].word, n, 1, outfile); X hashtbl[i].word = (char *)strptr; X strptr += n; X } X for (i = 0; i < hashsize; i++) { X if (hashtbl[i].next != 0) { X int x; X x = hashtbl[i].next - hashtbl; X hashtbl[i].next = (struct dent *)x; X } else { X hashtbl[i].next = (struct dent *)-1; X } X } X fwrite (hashtbl, sizeof (struct dent), hashsize, outfile); X hashheader.stringsize = strptr; X rewind (outfile); X fwrite (&hashheader, sizeof hashheader, 1, outfile); X fclose (outfile); X } X X filltable () X { X struct dent *freepointer, *nextword, *dp; X int i; X X for (freepointer = hashtbl; freepointer->used; freepointer++) X ; X for (nextword = hashtbl, i = numwords; i != 0; nextword++, i--) { X if (nextword->used == 0) { X continue; X } X if (nextword->next == NULL) { X continue; X } X if (nextword->next >= hashtbl && nextword->next < hashtbl + hashsize) { X continue; X } X dp = nextword; X while (dp->next) { X if (freepointer > hashtbl + hashsize) { X fprintf (stderr, "table overflow\n"); X getchar (); X break; X } X *freepointer = *(dp->next); X dp->next = freepointer; X dp = freepointer; X X while (freepointer->used) X freepointer++; X } X } X } X X X readdict () X { X struct dent d; X char lbuf[100]; X FILE *dictf; X int i; X int h; X char *p; X X if ((dictf = fopen (Dfile, "r")) == NULL) { X fprintf (stderr, "Can't open dictionary\n"); X exit (1); X } X X hashtbl = (struct dent *) calloc (numwords, sizeof (struct dent)); X if (hashtbl == NULL) { X fprintf (stderr, "couldn't allocate hash table\n"); X exit (1); X } X X i = 0; X while (fgets (lbuf, sizeof lbuf, dictf) != NULL) { X if (i % 1000 == 0) { X printf ("%d ", i); X fflush (stdout); X } X i++; X X p = &lbuf [ strlen (lbuf) - 1 ]; X if (*p == '\n') X *p = 0; X X if (makedent (lbuf, &d) < 0) X continue; X X d.word = malloc (strlen (lbuf) + 1); X if (d.word == NULL) { X fprintf (stderr, "couldn't allocate space for word %s\n", lbuf); X exit (1); X } X strcpy (d.word, lbuf); X X h = hash (lbuf, strlen (lbuf), hashsize); X X if (hashtbl[h].used == 0) { X hashtbl[h] = d; X X } else { X struct dent *dp; X X dp = (struct dent *) malloc (sizeof (struct dent)); X if (dp == NULL) { X fprintf (stderr, "couldn't allocate space for collision\n"); X exit (1); X } X *dp = d; X dp->next = hashtbl[h].next; X hashtbl[h].next = dp; X } X } X printf ("\n"); X } X X /* X * fill in the flags in d, and put a null after the word in s X */ X X makedent (lbuf, d) X char *lbuf; X struct dent *d; X { X char *p, *index(); X X d->next = NULL; X d->used = 1; X d->v_flag = 0; X d->n_flag = 0; X d->x_flag = 0; X d->h_flag = 0; X d->y_flag = 0; X d->g_flag = 0; X d->j_flag = 0; X d->d_flag = 0; X d->t_flag = 0; X d->r_flag = 0; X d->z_flag = 0; X d->s_flag = 0; X d->p_flag = 0; X d->m_flag = 0; X X p = index (lbuf, '/'); X if (p != NULL) X *p = 0; X if (strlen (lbuf) > WORDLEN - 1) { X printf ("%s: word too big\n"); X return (-1); X } X X if (p == NULL) X return (0); X X p++; X while (*p != NULL) { X switch (*p) { X case 'V': d->v_flag = 1; break; X case 'N': d->n_flag = 1; break; X case 'X': d->x_flag = 1; break; X case 'H': d->h_flag = 1; break; X case 'Y': d->y_flag = 1; break; X case 'G': d->g_flag = 1; break; X case 'J': d->j_flag = 1; break; X case 'D': d->d_flag = 1; break; X case 'T': d->t_flag = 1; break; X case 'R': d->r_flag = 1; break; X case 'Z': d->z_flag = 1; break; X case 'S': d->s_flag = 1; break; X case 'P': d->p_flag = 1; break; X case 'M': d->m_flag = 1; break; X case 0: X fprintf (stderr, "no key word %s\n", lbuf); X continue; X default: X fprintf (stderr, "unknown flag %c word %s\n", X *p, lbuf); X break; X } X p++; X if (*p != '/' && *p != NULL && *p != '\n') { X fprintf (stderr, "bad format %s (%c 0%o)\n", X lbuf, *p, *p); X break; X } X if (*p) X p++; X X } X return (0); X } X X newcount () X { X char buf[200]; X FILE *d; X int i; X X fprintf (stderr, "Counting words in dictionary ...\n"); X X if ((d = fopen (Dfile, "r")) == NULL) { X fprintf (stderr, "Can't open dictionary\n"); X exit (1); X } X X i = 0; X while (fgets (buf, sizeof buf, d) != NULL) { X i++; X if (i % 1000 == 0) { X printf ("%d ", i); X fflush (stdout); X } X } X fclose (d); X printf ("\n%d words\n", i); X if ((d = fopen (Cfile, "w")) == NULL) { X fprintf (stderr, "can't create %s\n", Cfile); X exit (1); X } X fprintf (d, "%d\n", i); X fclose (d); X } SHAR_EOF if test 6459 -ne "`wc -c < 'buildhash.c'`" then echo shar: error transmitting "'buildhash.c'" '(should have been 6459 characters)' fi fi # end of overwriting check # End of shell archive exit 0 -- Geoff Kuenning {hplabs,ihnp4}!trwrb!desint!geoff