Path: utzoo!attcan!uunet!mcsun!sunic!uupsi!nyser!rpi!zaphod.mps.ohio-state.edu!swrinde!cs.utexas.edu!rutgers!jarvis.csri.toronto.edu!clyde.concordia.ca!ireqs3!ireq-robot!gamin From: gamin@ireq-robot.UUCP (Martin Boyer) Newsgroups: comp.text Subject: Re: wanted "French digital dictionary" Summary: Is simple-minded better than nothing? Keywords: Spell, French, conjugation Message-ID: <112@amadeus.UUCP> Date: 12 Jan 90 06:14:36 GMT Reply-To: gamin@ireq-robot.UUCP (Martin Boyer) Organization: Laboratoire de robotique, Institut de recherche d'Hydro-Quebec Lines: 57 clement@opus.UUCP (Clement Pellerin): >We would buy a French digital dictionary if only we could find one. gamin@ireq-robot.UUCP (Martin Boyer): >[Let's get organized and see how we can hack a version of spell to handle >French] lamy@cs.utoronto.ca (Jean-Francois Lamy): >Spelling checking is more difficult in languages where number and gender >agreement is an issue. A simple-minded approach like that of spell or ispell >would give you an immense number of false errors. >[...] >So let me go on record as extremely skeptical that anything >useful would come out of a simple minded approach [...] clement@opus.cs.mcgill.ca (Clement Pellerin): >[...] >Nevertheless, I consider that simple minded help is better than nothing. I >would settle for anything that would lookup every word in the dictionary to >see if it is present or not. [...] I am an optimist by nature and sometimes by choice because it helps to get things done. I would go with Clement Pellerin and say that even an incomplete solution would help. I would be happy with a database of nouns and no verbs but the numerous variations of "avoir" (to have) and "^etre" (to be) because this is where we would get the most for our investment. Jean-Francois is probably right, however, in pointing out that a simple-minded approach will yield an immense number of false errors. It is quite possible that "filtering" a perfectly correct text through such a filter would produce the list of all the words, or variations of words, that our French speller doesn't know about. Is there a way to have a "loose" checker that would only flag words that are misspelled "for sure" and disregard those words that it doesn't know about. If, for instance, something like the soundex algorithm, which hashes words based on their pronunciation instead of their spelling, could recognize that a word is "close enough" to a dictionnary entry but not exactly the same, possibly because it is misspelled. Words that have no close match in the dictionnary are simply "unknown". Perhaps certain features of of French can be used (like the fact that you can't have four consonants in a row and three in only certain cases, or that certain sequences of letters are not part of any French word). I'd like to hear comments about the "brute force" approach; listing all non-trivial words in the dictionnary. How big would it be? Even if slow, is it practical? -- Martin Boyer ireq-robot!mboyer@Larry.McRCIM.McGILL.EDU Institut de recherche d'Hydro-Quebec mboyer@ireq-robot.uucp Varennes, QC, Canada J3X 1S1 +1 514 652-8136