Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!crdgw1!uunet!bellcore!walter!messy!mo
From: mo@messy.bellcore.com (Michael O'Dell)
Newsgroups: comp.music
Subject: Re: Digitized Voices
Message-ID: <1991May31.192614.4890@walter.bellcore.com>
Date: 31 May 91 19:26:14 GMT
References: <21576.9105302008@uk.ac.keele.seq1>
Sender: news@walter.bellcore.com (All the News - Period)
Reply-To: mo@bellcore.com (Michael O'Dell)
Organization: Center for Chaotic Repeatabilty
Lines: 42
Nntp-Posting-Host: messy


Yes, there have been a number of systems built.  The first I know of was
"speak", done by Doug McIlroy at Bell Labs and distributed with early
Unix systems.  It was used by a blind friend of mine to build the
world's first talking terminal.  The next system I know about
is usually referred to as "The NRL Rules", a system developed at
the US Naval Research Labs. These two systems are similar in that they
were both created to drive a Votrax phoneme-based synthesizer
(manufactured by Federal Screw Works, I kid you not), they
are both rather like production systems with exception handling,
the underlying machinery isn't hard to implement, and both do a reasonably
utilitarian, if sometimes quite humorous, job most of the time, but by no
stretch generate natural sounding extended speech.
The NRL stuff has been given away and several
other talking terminals and such have extended the NRL ruleset in conjunction
with the SC01 and SC02 chip versions of the Votrax.

The next class of system to come along is the DECtalk, which was an amazing
improvement in speech intelligebility for untrained listeners.  I believe
it is an allophone synthesizer down deep, but it does a LOT of analysis
trying to do something with intonation and stress. It ain't perfect, but
it is MUCH better than NRL or SPEAK for untrained listeners and
random text. Of course, this is a hardware box
(has at least one Moto 68K inside,  I think).

The next generation is a system called ORATOR which was done here at Bellcore.
It was designed to be very, very good at pronouncing names of people
and places and such (from text), in addition to good general text conversion.
THe underlying synthesizer is a demi-syllable synthesizer.
A demi-syllable is a half syllable, and the analysis software
generates a streams of demi-syllables from the text.  The demi-syllable
stream goes to a  waveform synthesizer which uses some quite
wizardly  LPC techniques to actually generate the output digital audio.
It sounds amazingly good, at least the few times I've heard it.

Of all these things, I believe the NRL stuff is still freely available.
And I know Bellcore is actively interested in licensing the ORATOR
technology.  (hey, I do work here!)

        -Mike O'Dell

Bellcore?? Bellcore isn't allowed to have opinions, so these MUST be mine!