Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!seismo!sundc!pitstop!sun!amdcad!ames!sri-spam!rutgers!rochester!PT.CS.CMU.EDU!SPEECH2.CS.CMU.EDU!kfl From: kfl@SPEECH2.CS.CMU.EDU (Kai-Fu Lee) Newsgroups: comp.ai Subject: Re: Practical effects of AI (speech) Message-ID: <267@PT.CS.CMU.EDU> Date: Sat, 31-Oct-87 17:06:02 EST Article-I.D.: PT.267 Posted: Sat Oct 31 17:06:02 1987 Date-Received: Thu, 5-Nov-87 05:20:41 EST References: <12@gollum.Columbia.NCR.COM> Sender: netnews@PT.CS.CMU.EDU Organization: Carnegie-Mellon University, CS/RI Lines: 55 Keywords: ai future effects In article <12@gollum.Columbia.NCR.COM>, rolandi@gollum.Columbia.NCR.COM (rolandi) writes: > > In article <6667@ut-ngp.UUCP> you write: > >I have a question for people: > > What practical effects do you think AI will have in the next ten > >years? > >........[etc...] > It would seem to me that the single greatest practical advancement for > AI will be in speaker independent, continuous speech recognition. This > is NOT to imply total computer "comprehension" in the sense of being > able to carry on an unrestricted conversation. I am NOT referring to > abilities to process natural language. That, is a long way off, and > will most likely come about as a function of a redefinition of the NLP > problem in terms of a machine learning issue. What "simple" speaker > independent, continuous speech recognition will provide is the ultimate > alternative to keyboard entry. This would thereby provide all of > the functionality of current technology to anyone who could pronounce > the commands. This issue will have a major impact on the industry and > on society. By making "every body" a user, more machines will be sold, > and because "every body" will have different needs, tha range of > automation will be widely extended. > Those of us who work on speech will be very encourage by this enthusiasm. However, (1) Speaker-independent continuous speech is much farther from reality than some companies would have you think. Currently, the best speech recognizer is IBM's Tangora, which makes about 6% errors on a 20,000 word vocabulary. But the Tangora is for speaker- dependent, isolate-words, grammar-guided recognition in a benign environment. Each of these four constraints cuts the error rate by 3 or more times if used independently. I don't know how well they will do if you remove all four constraints, but I would guess about 70% error rate. So while speech recognition has made a lot of advancements, it is still far from usable in the application you mentioned. (2) Spoken English is a harder problem than NLP of written English. If you make the recognizer too constrained (small vocabulary, fixed syntax, etc.), it will be harder to use than a keyboard. If you don't, you have to understand spoken English, which is really hard. (3) If this product were to materialize, it is far from clear that it would be an advancement for AI. At present, the most promising techniques are based on stochastic modeling, pattern recognition, information theory, signal processing, auditory modeling, etc.. So far, very few traditional AI techniques are used in, or work well for speech recognition. > > -w.rolandi > ncrcae!gollum!rolandi Kai-Fu Lee Computer Science Department Carnegie-Mellon University