Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!seismo!sundc!pitstop!sun!amdcad!ames!sri-spam!rutgers!rochester!PT.CS.CMU.EDU!SPEECH2.CS.CMU.EDU!kfl
From: kfl@SPEECH2.CS.CMU.EDU (Kai-Fu Lee)
Newsgroups: comp.ai
Subject: Re: Practical effects of AI (speech)
Message-ID: <267@PT.CS.CMU.EDU>
Date: Sat, 31-Oct-87 17:06:02 EST
Article-I.D.: PT.267
Posted: Sat Oct 31 17:06:02 1987
Date-Received: Thu, 5-Nov-87 05:20:41 EST
References: <12@gollum.Columbia.NCR.COM>
Sender: netnews@PT.CS.CMU.EDU
Organization: Carnegie-Mellon University, CS/RI
Lines: 55
Keywords: ai future effects

In article <12@gollum.Columbia.NCR.COM>, rolandi@gollum.Columbia.NCR.COM (rolandi) writes:
> 
> In article <6667@ut-ngp.UUCP> you write:
> >I have a question for people:
> >   What practical effects do you think AI will have in the next ten
> >years?
> >........[etc...]

> It would seem to me that the single greatest practical advancement for
> AI will be in speaker independent, continuous speech recognition. This
> is NOT to imply total computer "comprehension" in the sense of being
> able to carry on an unrestricted conversation.  I am NOT referring to
> abilities to process natural language.  That, is a long way off, and
> will most likely come about as a function of a redefinition of the NLP
> problem in terms of a machine learning issue.  What "simple" speaker
> independent, continuous speech recognition will provide is the ultimate
> alternative to keyboard entry.  This would thereby provide all of  
> the functionality of current technology to anyone who could pronounce
> the commands.  This issue will have a major impact on the industry and
> on society.  By making "every body" a user, more machines will be sold,
> and because "every body" will have different needs, tha range of  
> automation will be widely extended.
> 

Those of us who work on speech will be very encourage by this enthusiasm.
However,

(1) Speaker-independent continuous speech is much farther from reality
    than some companies would have you think.  Currently, the best
    speech recognizer is IBM's Tangora, which makes about 6% errors
    on a 20,000 word vocabulary.  But the Tangora is for speaker-
    dependent, isolate-words, grammar-guided recognition in a benign
    environment.  Each of these four constraints cuts the error rate 
    by 3 or more times if used independently.  I don't know how well
    they will do if you remove all four constraints, but I would guess
    about 70% error rate.  So while speech recognition has made a lot 
    of advancements, it is still far from usable in the application you 
    mentioned.
(2) Spoken English is a harder problem than NLP of written English.
    If you make the recognizer too constrained (small vocabulary, fixed
    syntax, etc.), it will be harder to use than a keyboard.  If you don't, 
    you have to understand spoken English, which is really hard.
(3) If this product were to materialize, it is far from clear that it
    would be an advancement for AI.  At present, the most promising 
    techniques are based on stochastic modeling, pattern recognition, 
    information theory, signal processing, auditory modeling, etc..
    So far, very few traditional AI techniques are used in, or work well 
    for speech recognition.  
> 
> -w.rolandi
> ncrcae!gollum!rolandi

Kai-Fu Lee
Computer Science Department
Carnegie-Mellon University