Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!elroy.jpl.nasa.gov!lll-winken!ncis.tis.llnl.gov!lance.tis.llnl.gov!turner From: turner@lance.tis.llnl.gov (Michael Turner) Newsgroups: comp.dsp Subject: Re: Compression Techniques for Speech Message-ID: <1192@ncis.tis.llnl.gov> Date: 13 Dec 90 23:15:58 GMT References: <77352@sgi.sgi.com> <9508@pitt.UUCP> <367@rufus.UUCP> Sender: news@ncis.tis.llnl.gov Organization: University of California, Berkeley Lines: 59 In article <367@rufus.UUCP> drake@drake.almaden.ibm.com writes: >In article <9508@pitt.UUCP> dcollins@pittslug.sug.org.UUCP (Daniel Collins) writes: >>I am working on a project where I have to compress a speech signal >>by 20-to-1 ratio in real-time. If 20-to-1 not practical, what are >>limiting factors, and what is achievable I will be using an 8 bit ADC >>with a 8KHz conversion rate. > >So from 64 Kbits/second you want to do 20:1 compression, down to 3200 bits >per second? Pretty aggressive. There's a company with a product that >hooks to a standard serial port that claims to be able to do speech at >1100 bits per second; the product is from Digispeech, called the DS201. >The compression algorithm seems to be proprietary. > >Sam Drake / IBM Almaden Research Center >Internet: drake@ibm.com BITNET: DRAKE at ALMADEN >Usenet: ...!uunet!ibmarc!drake Phone: (408) 927-1861 And that's with all the time in the world to compress. The best REAL-TIME compression I've heard about that preserves the signal significantly is some CELP (code-excited linear prediction, large codebook) technique (see recent IEEE AS&SP issues) that gets you down to 9600 baud. However, you need a significant fraction of a Cray to run at that rate, according to the author. Of course, there's the wonderful Apple Macintosh compression technique that runs in sublinear time: just throw out samples. But I assume you want to be able to understand the speech it when it's played back. I suggest you revise your constraints, either the compression ratio or the real-time response, or both. I'm no expert (yet) at speech compression, but I think you're out of luck. On the subject, however: I'm always on the look-out for NON-real-time compression algorithms (similar sampling rates, accuracy and compression ratio to the above problem). I know about Moser, etc. I'm most interested in techniques that exploit knowledge of perceptual limitations in hearing and production limitations in speech to figure out what parts of the raw signal can be thrown out. Assume that the speech has already been "recognized" down to something like the phoneme level, and that this information can be used in the compression algorithm. Assume also a single non-singing speaker with little background noise. I'm interested in good extraction and reproduction of nasal antiresonances, subglottal coupling, pitch-pulse shape, etc. For higher (16KHz) rates, getting believable sibilance is high on my list as well.* A parametric representation that allows control of variation of stress factors (duration, pitch, amplitude) is important. I see that the relevant techniques are out there, but I'm having trouble finding all them all and putting them all together. It doesn't help that I'm a nearly-total neophyte with DSP, which seems to the dialect of greek that most of the relevant literature is written in. On the other hand, I do know something about phonetics and phonology, which is swahili to a lot of DSP folk. --- Michael Turner turner@tis.llnl.gov * Just today I was talking on some bandwidth-limited line to a friend who couldn't understand that I was talking about our mutual friend BRUCE, not somebody I'd never met named RUTH.