Xref: utzoo comp.sys.mac:25041 comp.cog-eng:876 sci.lang:3914
Path: utzoo!attcan!uunet!lll-winken!ames!mailrus!tut.cis.ohio-state.edu!bloom-beacon!bu-cs!bucasb!merrill
From: merrill@bucasb (John Merrill)
Newsgroups: comp.sys.mac,comp.cog-eng,sci.lang
Subject: Re: Why are there no Speech Recognition products for the Mac??
Keywords: Voice Recognition, Voice Synthesis, Speech, Voice Response
Message-ID: <600634966.15179@bucasb.bu.edu>
Date: 12 Jan 89 19:02:46 GMT
References: <2972@uhccux.uhcc.hawaii.edu> <1029@ditsyda.oz>
Reply-To: merrill@bucasb (John Merrill)
Followup-To: comp.sys.mac
Organization: Boston University Center for Adaptive Systems
Lines: 72
In-reply-to: vincent@ditsyda.oz (David A. Vincent)

In article <1029@ditsyda.oz>, vincent@ditsyda (David A. Vincent) writes:
>
>
>in article <2972@uhccux.uhcc.hawaii.edu>, pam@uhccux.uhcc.hawaii.edu (.) says:
>> Xref: ditsyda comp.sys.mac:17740 comp.cog-eng:609 sci.lang:22
>> 
>> 
>> In article <6890> pardo@cs.washington.edu (David Keppel) writes:
>> | >>>Good speech recognition hardware can't be more than 5 or 10
>       ****
>> | >>>years away, can it?
>> | >>>-Peter Schachte
>> -- It's here!
>
>No, it is not here.  

No, indeed, it is not here.  It is still a long ways away, in fact.

Fact:

There is *one* (or maybe two) speaker independent, continuous
speech recognition system *in existence*.  There are no commercial
systems extant.  The one system, K-F Lee's SPHINX system, runs on
"several SUN-4's with floating point coprocessor boards"...and ties
them all down.  Furthermore, although it is not an isolated word
system, it can only handle a finite vocabulary in a *very* limited
grammar.

I have seen the Lincoln Labs derivative of SPHINX in operation.  It's
only about 50X real-time, and it isn't bad...if you're running in an
absolutely silent room.  But it most certainly *isn't* continuous,
speaker-independent recognition.

(But let me say one thing.  SPHINX is a major advance in the design
and construction of speech recognizers.  No, it ain't perfect...but
it's orders of magnitudes better than anything that came before.  I
was absolutely astounded when it was announced; it's so much better
than anything else around.  Since I've seen it (or, rather, something
very much like it), I'm even more impressed.  I just can't convey how
much of an advance it was over the older systems.)

>Also, I doubt that the so-called 'speaker independent' systems
>mentioned above will really recognize *anybody's* voice.   What about
>people speaking with strong accents?  Or in perfect 'english' but over
>background noise?  

I haven't seen any of the new generations of recognizers with accented
english, but the one I have seen can deal with a variety of speaking
tempi and conditions (yelling, noise-in-ears, deafened, etc.)  As I
said before, it didn't deal well with noisy environments.

On the other hand, there is an accumulating body of evidence that
problems with background noise can be ameliorated by the use of
non-standard representations of the input stream, some of which appear
to be better able to extract signal from background.

>>    *** So where are the Voice Recognition systems for the Mac??? ***
>
>Yes, where?  But, by the way, what is voice (as opposed to speech)
>recognition?  (Or is there no difference?  In normal discussion, 
>'voice' is rarely interchangable with 'speech'.)  

There is a difference.  Voice recognition is talker identification (at
least, in my jargon).  It's much easier than speech recognition.
(You can replace speaker dependence with text dependence, and then
identify the speaker that spoke your fixed text, as opposed to
identifying the text spoken by your fixed speaker.)
--
John Merrill			|	ARPA:	merrill@bucasb.bu.edu
Center for Adaptive Systems	|	
111 Cummington Street		|	
Boston, Mass. 02215		|	Phone:	(617) 353-5765