Path: utzoo!utgpu!jarvis.csri.toronto.edu!neat.cs.toronto.edu!lamy Newsgroups: comp.text From: lamy@cs.utoronto.ca (Jean-Francois Lamy) Subject: Re: wanted "French digital dictionary" Message-ID: <90Jan11.141206est.2694@neat.cs.toronto.edu> References: <1838@opus.cs.mcgill.ca> <110@pellan.UUCP> <12755@cgl.ucsf.EDU> Date: 11 Jan 90 19:13:02 GMT Lines: 25 Spelling checking is more difficult in languages where number and gender agreement is an issue. A simple-minded approach like that of spell or ispell would give you an immense number of false errors. In the case of French you would at least need all conjugated variants of French verbs, and a way to deal with accents properly, and I claim that would still not be enough. I am aware of one effort in the early 80's to build a full morphological dictionary (i.e. one that has enough data to support conjugation -- If I remember well there are over 180 forms of verb conjugation in French -- forget about those 3 groups and a few irregular ones you learned about in High School :-), and lemmatisation (i.e. given the word "fusse" figure out that it is a form of of the verb "e^tre"). As far as I recall the project got mired in a feud about copyright/licensing issues, and never got distributed or commercialized. The reason I bring this up is that someone tried to do spelling verification using that data, and found out that it is a much harder problem than one might think it is. So let me go on record as extremely skeptical that anything useful would come out of a simple minded approach, and that what works for English (spell/ispell) will not carry over to other languages (like French) where word morphology is subject to weird and wonderful transmutations when changing gender, number or tense. Jean-Francois Lamy lamy@cs.utoronto.ca, uunet!cs.utoronto.ca!lamy Department of Computer Science, University of Toronto, Canada M5S 1A4