Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!mips!wrdis01!gatech!purdue!haven.umd.edu!mimsy!leviathan.cs.umd.edu!ogata From: ogata@leviathan.cs.umd.edu (Jefferson Ogata) Newsgroups: comp.music Subject: Re: voice synthesizer Message-ID: <33614@mimsy.umd.edu> Date: 26 Apr 91 18:08:20 GMT References: <71181@eerie.acsu.Buffalo.EDU> <1991Apr18.230956.20033@vicorp.com> <33454@mimsy.umd.edu> <1991Apr25.210916.348@vicorp.com> Sender: news@mimsy.umd.edu Reply-To: ogata@leviathan.cs.umd.edu (Jefferson Ogata) Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742 Lines: 76 I wrote: |> I believe that this is just a pitch transposer coupled with a slightly |> modified vocal inflection. I know that I've gotten similar effects |> messing with pitch transposers, although my voice *already* sounds |> male, so... In article <1991Apr25.210916.348@vicorp.com> ron@sunspark.UUCP (Ron Peterson) writes: |> How do you transpose a voice in pitch without losing its natural sound? |> I've heard of pitch transposers that convert the input signal to a |> square wave and then multiply or divide it to get a fundamental pitch |> that is an octave higher or lower, but this destroys all of the |> information contained in the shape of the waves. Is there another |> way to do it? And how do you get sub-octave transposition? |> ron@vicorp.com or uunet!vicorp!ron Here is a primitive description. Actual algorithms are more refined, especially in what data they decide to throw away. Measure the frequency of the input (using zero-crossings, for example). Digitize the input. Then: Down an octave: Save every other input wave. Throw the other one away. For each output wave period (twice the input period), output your sampled input so it takes twice as long. For example, for each sample of the input, output that sample twice. Or for better results, inter- polate each sample point with the following one to get your extra point. Up an octave: Throw away every other sample point. For each input wave period (half the output period), output the complete sample stream twice. For other intervals of transposition, you have to throw away/duplicate different amounts of information. Now regular pitch transposers don't really make a "natural" sounding voice, because the algorithm isn't so great (especially the frequency tracking, because of noise from sibilants) and also because they transpose sibilant noise. Sibilant noise should be at the same frequency no matter what the pitch of the voice is. An S should sound the same whether I am singing high or low. Other aspirant noise has the same problem, but it really comes out in S, SH, TH, F, etc. I correct for this by adjusting my pronunciation of the sibilants. If I am transposing up, I sing an S as a SH, so it comes out sounding like S. Transposing down I do the opposite. It is very difficult to get a really natural sounding voice, but changing your sibilants makes a big difference. For a good example, listen to the Chipmunks on Saturday morning cartoons. These are voices transposed straight up with no sibilant adjustment. The further the transposition, the worse the sibilants are distorted. This is why pitch-riders don't screw up the voice; they are typically transposing less than a semitone, which is pretty much okay. The pitch transposition machine I usually use (Digitech IPS-33) doesn't guess frequency extremely fast so it can avoid tracking all over the place during a sibilant. This is a big tradeoff: if the machine tracks pitch too quickly, it's wrong most of the time in a word like "fist", where the noise has indeterminate frequency. But if it doesn't track fast enough there will be audible delay during melodic lines. The Digitech is set to a fairly reasonable tracking rate. I think there is a PLL tied to the input with a limit on slew rate, and the processor measures the frequency of the PLL rather than trying to decompose the input. I'm not sure about this, though; it just seems like the right way to do it. Hope this helps. -- Jefferson Ogata ogata@cs.umd.edu University of Maryland Department of Computer Science "Sure. Understanding today's complex world of the future *is* a little like having bees live in your head."