Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!crdgw1!uunet!bellcore!walter!messy!mo From: mo@messy.bellcore.com (Michael O'Dell) Newsgroups: comp.music Subject: Re: Digitized Voices Message-ID: <1991May31.192614.4890@walter.bellcore.com> Date: 31 May 91 19:26:14 GMT References: <21576.9105302008@uk.ac.keele.seq1> Sender: news@walter.bellcore.com (All the News - Period) Reply-To: mo@bellcore.com (Michael O'Dell) Organization: Center for Chaotic Repeatabilty Lines: 42 Nntp-Posting-Host: messy Yes, there have been a number of systems built. The first I know of was "speak", done by Doug McIlroy at Bell Labs and distributed with early Unix systems. It was used by a blind friend of mine to build the world's first talking terminal. The next system I know about is usually referred to as "The NRL Rules", a system developed at the US Naval Research Labs. These two systems are similar in that they were both created to drive a Votrax phoneme-based synthesizer (manufactured by Federal Screw Works, I kid you not), they are both rather like production systems with exception handling, the underlying machinery isn't hard to implement, and both do a reasonably utilitarian, if sometimes quite humorous, job most of the time, but by no stretch generate natural sounding extended speech. The NRL stuff has been given away and several other talking terminals and such have extended the NRL ruleset in conjunction with the SC01 and SC02 chip versions of the Votrax. The next class of system to come along is the DECtalk, which was an amazing improvement in speech intelligebility for untrained listeners. I believe it is an allophone synthesizer down deep, but it does a LOT of analysis trying to do something with intonation and stress. It ain't perfect, but it is MUCH better than NRL or SPEAK for untrained listeners and random text. Of course, this is a hardware box (has at least one Moto 68K inside, I think). The next generation is a system called ORATOR which was done here at Bellcore. It was designed to be very, very good at pronouncing names of people and places and such (from text), in addition to good general text conversion. THe underlying synthesizer is a demi-syllable synthesizer. A demi-syllable is a half syllable, and the analysis software generates a streams of demi-syllables from the text. The demi-syllable stream goes to a waveform synthesizer which uses some quite wizardly LPC techniques to actually generate the output digital audio. It sounds amazingly good, at least the few times I've heard it. Of all these things, I believe the NRL stuff is still freely available. And I know Bellcore is actively interested in licensing the ORATOR technology. (hey, I do work here!) -Mike O'Dell Bellcore?? Bellcore isn't allowed to have opinions, so these MUST be mine!