Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!samsung!uunet!stanford.edu!csli!kornai
From: kornai@csli.Stanford.EDU (Andras Kornai)
Newsgroups: comp.compression
Subject: Re: Voice Standards
Message-ID: <19645@csli.Stanford.EDU>
Date: 4 Jun 91 02:30:37 GMT
References: <15060@hacgate.UUCP> <1223@ocsmd.com> <1991Jun3.203049.7349@sol.cs.wmich.edu>
Distribution: na
Organization: Center for the Study of Language and Information, Stanford U.
Lines: 20

In <1991Jun3.203049.7349@sol.cs.wmich.edu> campbell@sol.cs.wmich.edu (Paul Campbell) writes:

>In LPC coding, the best numbers I have seen were from Dr.
>Markhoul (did I get the name right? I don't have my notes pile nearby), in
>which he got the bit rate down to a more reasonable 150 bps or less.

Dream on! Your average language has some 40 to 60 phonemes, so you
need 5 to 6 bits/phoneme. In moderately fast speech there are 15
phonemes/sec -- this gives an absolute lower limit around 80 bps.  If
you want to code intonation/prosody at all, you will need another 4
bits/phoneme, bringing it up to 100-200bps, depending on speech rate.

John Makhoul's work at BBN is at the cutting edge of research -- it is
very far from a product, even farther from a standard. I think he is
at 300bps (using some very sophisticated vector quantization (VQ) and
linear predictive coding (LPC) techniques) which is pretty impressive
compared to the 9.6kbps now more or less standard for voice
compression, but makes decompressed speech sound pretty synthetic.

Andras Kornai (kornai@csli.stanford.edu)