Path: utzoo!attcan!uunet!ginosko!usc!apple!wass
From: wass@Apple.COM (Steve Wasserman)
Newsgroups: comp.dsp
Subject: Re: Adjust-Speed CD player?
Message-ID: <4320@internal.Apple.COM>
Date: 22 Sep 89 22:02:43 GMT
References: <61860@tut.cis.ohio-state.edu> <4653@orca.WV.TEK.COM>
Organization: Apple Computer Inc, Cupertino, CA
Lines: 110

In article <4653@orca.WV.TEK.COM> mhorne%ka7axd.wv.tek.com@relay.cs.net writes:
>> 	I know nothing about DSP other than what I've figured out
	... much stuff deleted ...
>source, however, I suggest interpolating between the samples by simply
>convolving the data stream with a sinc function.
>

There are two separate problems to be solved here.

First: if you spin the CD faster than usual, what do you do with the
extra samples?  For example: if a CD is sped up such that 52.9
Ksamples/second are read (which represents an increase in speed of 6/5
or 20%), 8.8 "extra" Ksamples accumulate every second.  I am assuming,
of course, that the sound will be reconstructed by circuitry which
operates at a constant 44.1 KHz (or some oversampling multiple
thereof, I suppose).  The reason I make this assumption is because it
would be difficult to construct a variable analog reconstruction
filter that would be able to handle a large range of possible sampling
speeds (say plus or minus five times the original sampling frequency).
This problem is called "sample rate conversion" or something similar
in textbooks.

I don't think that it can be said that any one sample is "more
important" than any other sample because it is a local minimum or
maximum.  In fact, the method of not dropping these sample as
suggested would introduce random noise into the signal.  

In general, it is easy to convert between two sampling rates that are
rational multiples of each other (hence, I chose 6/5 in my example).
The first step is to interpolate the signal by the numerator ...
convert it to a sampling rate six times the original in my example.
This is simply done by adding five zeros between every sample and then
applying a digital filter.  Zero padding has the effect of replicating
the original spectrum a number of times.  A filter is used to remove
the unwanted copies of the original spectrum.  Sorry I can't think of
a good way to draw spectra using text only, but diagrams would be
helpful here.

The next step is to filter out all (or most) spectral energy which
would be "aliased" when throwing away the unneeded samples.  This
involves applying another filter to quiet the components above the
Nyquist rate of the signal after the extras are thrown out.  After
this has been done, four of every five samples can be safely thrown
out without distorting the signal.  (always the same four out of the five).

In practice, the two filters can be combined so the procedure is:
zero-pad, filter, and the throw away the unneeded samples.  People
have found more clever ways of doing this in some circumstances, but
in theory, this way is as good as any.  Obviously, if you want
to change the sampling rate by 7724/137, you have a problem.

>All this said, I don't think this is the optimal method for tone shifting,
>however it might work for `fast/slow-forward' effects.  If you wish to shift
>the tones while retaining the same sample rate, I would suggest some sort of
>frequency scaling algorithm, perhaps by doing a digital mix with a reference
>(digital) carrier (i.e. ref = 100 Hz for a 100 Hz shift upward in frequency),

The second problem is: once you've thrown away the right number of
samples, how do you make the pitch sound right?  Note that a mere
frequency translation by digitally mixing in a reference frequency is
not exactly what's required to make everything right again.  Spinning
the CD faster EXPANDS the spectrum of the original sound in frequency
-- it doesn't just shift it.  (unless, of course, you are looking on a
log scale :-) To prove this to yourself, imagine a recording of two
notes: concert A (440 Hz) and one octave above it (880 Hz).  When we
increase the CD speed by 20 %, these two frequencies are changed to
528 and 1056 Hz.  Assume that we've thrown away the proper number of
samples from the original recording.  Now, if we mix the resultant
signal with a 88 Hz signal (528 - 440 = 88) and do the proper
filtering, we'll get 440 Hz and 968 Hz ... oops, they don't sound like
octaves any more.

Theoretically, what needs to be done is to compress the spectrum of
the speeded-up sound down to its original size.  This can be done by
applying the previously discussed interpolation/decimation method to
the FFT samples of the signal (I think) and then inverse transforming
and playing the signal out at the original sampling rate.  I'm sure
that somebody has come up with a computationally superior method to
the one I have suggested.


(note: invert this discussion if you want to talk about slowing a
recording down.)


>> 	Also, if you're going to remove samples, I think you
>> shouldn't use a simple kill-every-nth-sample procedure...
>
>If you want to throw away samples, you *really* need to filter the data before
>doing so, otherwise you will see (hear) aliasing of the data, depending upon
>the spectra of the input and how often you are throwing away samples.  When
>you decimate any sampled data set, you must low-pass filter the data at half
>the new sample rate (Nyquist rule) unless you are sure that the data has
>no spectral components above half the new sample rate.
>
>followed by a carrier and lower sideband suppression (Hilbert transform filters
>are very easy to implement digitally).  At a fast glance, I think this might
>work well for moving the spectra of an audio source up/down some arbitrary
>frequency, and should be doable with some of the common DSP chips currently
>available.
>
>Mike Horne
>Visual Systems Group
>Tektronix, Inc.
>mhorne@ka7axd.wv.tek.com


-- 
swass@apple.com