Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!mips!pacbell.com!att!cbnewsh!wcs From: wcs@cbnewsh.att.com (Bill Stewart 908-949-0705 erebus.att.com!wcs) Newsgroups: comp.compression Subject: Re: Voice Compression Message-ID: <1991Jun19.224821.19923@cbnewsh.att.com> Date: 19 Jun 91 22:48:21 GMT References: <1991Jun12.191313.16540@qualcomm.com> Organization: AT&T Bell Labs Special Services Division Lines: 56 In article <1991Jun12.191313.16540@qualcomm.com> rdippold@lajolla.qualcomm.com (Ron Dippold) writes: ] I'm looking at different methods of sound compression for voice. What kind of bit rate do you need? Are you sure it will all be voice, and not modem-data? How much are you willing to spend for a compressor? What sound quality do you need - decent speech where you can recognize the speaker, or synthetic Speak-And-Spell (tm)? I assume it needs to run in real-time? Is it important to use a standard compression algorithm (probably)? There are lots of kinds of voice compression, and AT&T and presumably other phone companies have done infinite amounts of research :-) The AT&T Technical Journal (Formerly Bell System Technical Journal, AT&T Bell Laboratories Technical Journal, etc.) often has articles on speech compression. Also, we've published a number of books, and articles in lots of papers - I suspect the various IEEE journals are a good place to look, as are books on Digital Signal Processing (DSP). (Disclaimer: I'm not a speech-hacker.) Typical telephone voice is 64 kbps, with 8000 8-bit samples (non-linearly companded - a linear encoding would probably be 12 bits?) Compression to 32 kbps is easy using ADPCM (Adaptive Differential Pulse Code Modulation - the basic 64 kbps stuff is regular PCM.) There's a lot of commercial telecomm equipment that does 32kbps ADPCM. Also, if your application is actually telephony, a significant amount of compression is possible simply by detecting silence - most conversations only have one speaker at a time, and gaps between words are non-trivial. There are a number of techniques for getting to speeds in the 8-16 kbps range, without seriously degraded quality. Essentially you're trading bandwidth vs. quality vs. processing complexity, and the advances in DSP chips have made processing MUCH faster and cheaper than it used to be. One of the families of coding algorithms used is called Linear Predictive Coding - essentially, you're predicting what sounds will come next, and sending the difference between the prediction and the actual sound. Another common technique is to split up the speech energy into different frequency bands, and separately encode the different bands. Encryption equipment typically uses 2400, 4800, and 9600 baud voice, which gets digitally encrypted and sent over modems. It's not bad voice quality, though there's a bit of processing delay (fractions of a second, but enough to notice.) It's certainly good enough for typical voice-mail or answering machine. People have been mentioning extremely-low-bit rate synthetic algorithms here, with rates like 300 baud, which are basically identifying the phonemes in your words (a bit simpler than speech-to-text, since you don't have to disambiguate spelling), and reconstructing them at the far end, with a voice that presumably doesn't resemble you at all. -- Pray for peace; Bill # Bill Stewart 908-949-0705 erebus.att.com!wcs AT&T Bell Labs 4M-312 Holmdel NJ # No, that's covered by the Drug Exception to the Fourth Amendment. # You can read it here in the fine print.