Path: utzoo!attcan!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!munnari.oz.au!ariel!ray From: ray@ariel.ucs.unimelb.edu.au (Douglas Ray) Newsgroups: comp.dsp Subject: Re: Speech recognition: state-of-the-art ? Message-ID: <377@ariel.ucs.unimelb.edu.au> Date: 24 Dec 90 05:58:33 GMT References: <1990Dec19.215611.10659@mmm.serc.3m.com> Organization: University of Melbourne Lines: 63 From article <1990Dec19.215611.10659@mmm.serc.3m.com>, by schultz@halley.tmc.edu (John C. Schultz): > In article <1990Dec17.202616.3021@quagga.ru.ac.za> csirpd@quagga.ru.ac.za (Paul Ducklin) writes: >> >> * for a specific voice, and a non-mega$ desktop machine, what's >> a good recognition vocabulary? 5000 words? 10000 words? I >> >> * for "generic voice" (eg: all American-speaking females), what is >> a good vocabulary? What sort of reliability is attainable? >> >> * what's good vis-a-vis "natural" or continuous speech? How >> capable are recognition systems at handling speech without >> staccato-type interword pauses? > > * How reliable is voice recognition in noisy environments with > respect to vocabulary size? For example if the system only > needs to recognize 50 or so words, is that more robust than > a 5000 word system? How much more reliable? Don't know answers > would be preferable to wrong answers. initial response: I'm not qualified in this field, but if I haven't misinterpreted the figures, here's summaries from papers presented at the 3rd international conference on Speech Science and Technology, Melbourne, Australia, November 1990. General attitude at conference was to quote "small" vocabs as 20 - 200 words, and large as 500 - 1000 words. [only first authors quoted] C. Rowles (Telecom Australia) state of art for speaker independent, continuous speech, modest vocab. (200-500w ?): 95% word recognition. This 95% figure comes up a lot: W.A. Smith (Waikato, N.Z.) presented a feature selection algorithm for speaker independant, isolated word recognition, vocab. 20w: 95% word recognition Tracy Clark (Canturbury, N.Z.) compares various methods in isolation, comments on accent dependance; speaker dependant, isolated word, 10w vocab.: best up to 96% word recognition but for larger vocabs you can't expect this: Tony Robinson (Cambridge, U.K.) Preliminary work on word recognition without grammatic constraints: speaker independant, continuous speech, using the DARPA 1000 word Resource Management Task: 52.1% word recognition rate (43.3% accuracy), but quotes the Sphinx system at 81.9%. There was also some work on language recognition, eg: Walter Weigel (Munich, Germany) speaker independant, continuous speech, 132w vocab., 40 rule context-free grammar subset of German: 74% sentence recognition [The conference proceedings contain around 80 papers in over 500 pp.; inquiries to the Secretary, Australian Speech Science and Technology Association, GPO Box 143, Canberra ACT 2601, Australia]