Path: utzoo!attcan!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!munnari.oz.au!ariel!ray
From: ray@ariel.ucs.unimelb.edu.au (Douglas Ray)
Newsgroups: comp.dsp
Subject: Re: Speech recognition: state-of-the-art ?
Message-ID: <377@ariel.ucs.unimelb.edu.au>
Date: 24 Dec 90 05:58:33 GMT
References: <1990Dec19.215611.10659@mmm.serc.3m.com>
Organization: University of Melbourne
Lines: 63

From article <1990Dec19.215611.10659@mmm.serc.3m.com>, by schultz@halley.tmc.edu (John C. Schultz):
> In article <1990Dec17.202616.3021@quagga.ru.ac.za> csirpd@quagga.ru.ac.za (Paul Ducklin) writes:
>>
>>  * for a specific voice, and a non-mega$ desktop machine, what's
>>    a good recognition vocabulary? 5000 words? 10000 words? I
>>
>>  * for "generic voice" (eg: all American-speaking females), what is
>>    a good vocabulary? What sort of reliability is attainable?
>>
>>  * what's good vis-a-vis "natural" or continuous speech? How
>>    capable are recognition systems at handling speech without
>>    staccato-type interword pauses?
> 
>    * How reliable is voice recognition in noisy environments with
>      respect to vocabulary size?  For example if the system only
>      needs to recognize 50 or so words, is that more robust than 
>      a 5000 word system? How much more reliable? Don't know answers
>      would be preferable to wrong answers.

initial response:

I'm not qualified in this field, but if I haven't misinterpreted the
figures, here's summaries from papers presented at the 3rd international
conference on Speech Science and Technology, Melbourne, Australia,
November 1990.

General attitude at conference was to quote "small" vocabs as 20 - 200
words, and large as 500 - 1000 words.

[only first authors quoted]

  C. Rowles (Telecom Australia)
    state of art for speaker independent, continuous speech, modest vocab.
    (200-500w ?): 95% word recognition.

This 95% figure comes up a lot:

  W.A. Smith (Waikato, N.Z.)
    presented a feature selection algorithm for speaker independant, isolated
    word recognition, vocab. 20w: 95% word recognition

  Tracy Clark (Canturbury, N.Z.)
    compares various methods in isolation, comments on accent dependance;
    speaker dependant, isolated word, 10w vocab.: best up to 96% word
    recognition

but for larger vocabs you can't expect this:

  Tony Robinson (Cambridge, U.K.)
    Preliminary work on word recognition without grammatic constraints:
    speaker independant, continuous speech, using the DARPA 1000 word
    Resource Management Task: 52.1% word recognition rate (43.3% accuracy),
    but quotes the Sphinx system at 81.9%.

There was also some work on language recognition, eg:

  Walter Weigel (Munich, Germany)
    speaker independant, continuous speech, 132w vocab., 40 rule context-free
    grammar subset of German: 74% sentence recognition

[The conference proceedings contain around 80 papers in over 500 pp.;
inquiries to the Secretary, Australian Speech Science and Technology
Association, GPO Box 143, Canberra ACT 2601, Australia]