Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!mips!wrdis01!gatech!purdue!haven.umd.edu!mimsy!leviathan.cs.umd.edu!ogata
From: ogata@leviathan.cs.umd.edu (Jefferson Ogata)
Newsgroups: comp.music
Subject: Re: voice synthesizer
Message-ID: <33614@mimsy.umd.edu>
Date: 26 Apr 91 18:08:20 GMT
References: <71181@eerie.acsu.Buffalo.EDU> <1991Apr18.230956.20033@vicorp.com> <33454@mimsy.umd.edu> <1991Apr25.210916.348@vicorp.com>
Sender: news@mimsy.umd.edu
Reply-To: ogata@leviathan.cs.umd.edu (Jefferson Ogata)
Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742
Lines: 76

I wrote:
|> I believe that this is just a pitch transposer coupled with a slightly
|> modified vocal inflection. I know that I've gotten similar effects
|> messing with pitch transposers, although my voice *already* sounds
|> male, so...

In article <1991Apr25.210916.348@vicorp.com> ron@sunspark.UUCP (Ron Peterson) writes:
|> How do you transpose a voice in pitch without losing its natural sound?
|> I've heard of pitch transposers that convert the input signal to a
|> square wave and then multiply or divide it to get a fundamental pitch
|> that is an octave higher or lower, but this destroys all of the 
|> information contained in the shape of the waves.  Is there another
|> way to do it?  And how do you get sub-octave transposition? 
|> ron@vicorp.com or uunet!vicorp!ron

Here is a primitive description. Actual algorithms are more refined,
especially in what data they decide to throw away.

Measure the frequency of the input (using zero-crossings, for example).
Digitize the input. Then:

Down an octave:
   Save every other input wave. Throw the other one away.
   For each output wave period (twice the input period), output your
sampled input so it takes twice as long. For example, for each sample
of the input, output that sample twice. Or for better results, inter-
polate each sample point with the following one to get your extra
point.

Up an octave:
   Throw away every other sample point.
   For each input wave period (half the output period), output the
complete sample stream twice.

For other intervals of transposition, you have to throw away/duplicate
different amounts of information.

Now regular pitch transposers don't really make a "natural" sounding
voice, because the algorithm isn't so great (especially the frequency
tracking, because of noise from sibilants) and also because they
transpose sibilant noise. Sibilant noise should be at the same
frequency no matter what the pitch of the voice is. An S should
sound the same whether I am singing high or low. Other aspirant
noise has the same problem, but it really comes out in S, SH, TH,
F, etc. I correct for this by adjusting my pronunciation of the
sibilants. If I am transposing up, I sing an S as a SH, so it
comes out sounding like S. Transposing down I do the opposite. It
is very difficult to get a really natural sounding voice, but
changing your sibilants makes a big difference. For a good example,
listen to the Chipmunks on Saturday morning cartoons. These are
voices transposed straight up with no sibilant adjustment.

The further the transposition, the worse the sibilants are
distorted. This is why pitch-riders don't screw up the voice;
they are typically transposing less than a semitone, which is
pretty much okay.

The pitch transposition machine I usually use (Digitech IPS-33)
doesn't guess frequency extremely fast so it can avoid tracking
all over the place during a sibilant. This is a big tradeoff: if
the machine tracks pitch too quickly, it's wrong most of the time
in a word like "fist", where the noise has indeterminate frequency.
But if it doesn't track fast enough there will be audible delay
during melodic lines. The Digitech is set to a fairly reasonable
tracking rate. I think there is a PLL tied to the input with a
limit on slew rate, and the processor measures the frequency of the
PLL rather than trying to decompose the input.  I'm not sure about
this, though; it just seems like the right way to do it.

Hope this helps.

--
Jefferson Ogata                 ogata@cs.umd.edu
University of Maryland          Department of Computer Science
   "Sure. Understanding today's complex world of the future *is*
          a little like having bees live in your head."