Path: utzoo!attcan!uunet!portal!cup.portal.com!doug-merritt
From: doug-merritt@cup.portal.com
Newsgroups: comp.sys.amiga
Subject: Re: Sampling at 29KHz (long)
Message-ID: <5637@cup.portal.com>
Date: 19 May 88 18:22:14 GMT
References: <2845@polya.STANFORD.EDU> <734@eos.UUCP> <53788@sun.uucp>
Organization: The Portal System (TM)
Lines: 180
XPortal-User-Id: 1.1001.4407

A number of questions have been raised about sampled sound by several
people, and the following posting is long because it answers all of
the ones that weren't answered previously, as well as correcting some
errors in statements by previous posters. If you don't care about
high quality sound generation, don't read it.

Since Tom got a mild flame for saying he was not an expert, I guess I
should say that I *am* an expert, as far as the following goes.
Trust me. 1/2 :-)

Tom Rokicki wrote:
>	Now let's sample our 10KHz sine wave at 20.2KHz.  We
>are now slightly off our frequency, and we will see a 10KHz
>tone modulated by a 200 Hz carrier; this will sound like two
>narrowly separated frequencies beating against one another.
>Ugly as sin.

The beat frequencies are the sum and difference of the sampling
frequency and the highest frequency component in the sample.
There will always be a beat frequency, the question is what to
do with it. If you have the lowpass filter turned on (always on
the the 1000, btw), then you aim to put the beat frequency above
the range of the lowpass filter (see pg 156, Fig 5-7 of the Hardware
Reference Manual). Otherwise:

There are two ways to remove aliasing of the sampling frequency.
One is to remove it via an add-on hardware filter (a 200hz notch
filter, for instance). The other is to transform the energy of the
beat frequency into random noise, which is far less annoying to the
ear, and distributes the energy across the entire spectrum, so that
any given component of the noise will be very low energy. This is
actually the *preferred* method (in absence of a low pass filter),
according to the best minds in sampling theory (not me; but I've been
to talks on the subject). It's better than a notch filter because you
don't lose the original signal at the notch frequency.

The way that you transform the alias into noise is to randomly vary
the sampling frequency (actually ideally it should follow a Poisson
distribution, but random works pretty well). In other words, instead
of sampling exactly every N microseconds, you sample every N+random()
microseconds, where 0 <= random() <= N. Thus you still get an alias/beat,
but its frequency varies randomly on each sample.

(BTW the equivalent method for graphics is to sample on a randomly
disturbed grid, and Pixar uses this method for improving ray tracing.
The human eye does the same thing: outside the fovea, the rods are
placed by a Poisson distribution, to eliminate aliasing.)

As far as I can tell from a quick glance at the Hardware Reference
Manual, this is perfectly feasible if you use Direct (Non-DMA)
output, with a maximum resolution of 280 nanoseconds. See page 161,
The Audio State Machine. As far as DMA is concerned, intuitively
you would think you're stuck with aliasing due to a fixed period,
but you *might* be able to pull the same trick if you period-modulate
one channel with random data in another. See pg 151, Table 5-5.

Tom again:
>	Oh, so you want to know how to create the smaller samples from
>the larger ones.  I'll leave that to the experts out there.

It's easy...you just do a (weighted) average down. If you go from 256
to 32 samples then you just average each 8 adjacent samples (weight of
1 since 256 = 8*32) in the source into one sample in the original.

If you wanted to go from 256 to say 49 samples, then you'd average each
5.224 samples (5.224 = 256/49) together. In other words, the first
destination sample equals the sum of the first five source samples,
plus 0.224 times the sixth source sample, divided by 5.224. This leaves
you with 0.776 worth of the sixth sample leftover, so you for the
second destination sample you take that, plus the next 4 samples,
plus 0.448 (= 5.224-4-0.776) times the fifth sample, all divided by
5.224. And continue.

The reason this works is that averaging is the time domain equivalent
of a frequency domain low pass filter.

Now as to Phil's posting:

Phil Stone then posts to critique Tom's perfectly good article,
in particular about its length and unrealistic examples. Just for
the record, Phil's article was almost as long as Tom's (66 vs 84 lines),
and used even less realistic examples, because he concentrated on
what he wanted to see added to the system, where everyone else has
been talking mostly about what *is*. People who live in glass houses...

Phil also writes:
>Compute a 256-point sine wave and put it in memory (32-byte sine waves
>don't cut it in my book - even a tin ear can hear the interpolation
>noise in fixed, jagged steps that big).  With a fixed sample-playback
>increment and a maximum rate of 29 KHz, the highest frequency you can
>generate is 29000/256 = 113 Hz! - just about TWO OCTAVES below A440(!)

That's why Tom (or was it Chuck?) talked about 32 byte samples...they
were being realistic about the current hardware, you see. Besides, what
you're talking about has to do with accuracy of reproduction of the
higher frequency harmonics need to synthesize, say, a musical instrument's
timbre, not "the highest frequency you can generate". The highest
frequency component is strictly a function of sample rate. What you're
talking about is the highest frequency *fundamental*. And 113hz doesn't
give much range, now does it?

Nonetheless I'd have to agree that it would be nice if you could
sample faster so as to raise the frequency of the highest achievable
fundamental, while still accurately reproducing timbre. But note that
you *have* the following feature:

>With this same sine wave and a variable (intergral.fractional) sampling
>increment, one could generate a maximum frequency of 14 KHz!

My guess is that you can do this already, as I discussed above.

>Your postscript tosses off a reference to *making* the other octaves of
>sound - this is called interpolation, a process which has earned many Ph.D's
>over the last ten years.  Try interpolating 4 octaves from one sample *live*
>sometime (can you say "Cray killer?")

Gross exaggeration. The papers on the topic of interpolating down
from larger to smaller waveforms occurred a lot more than 10 years ago,
and a 68000 with a math chip is perfectly capable of keeping up with the
demands. A Cray could do a full orchestra in real time.

Then Tom posts again:
>Yes, but because of the integral.fractional nature of your incrementing,
>you are guaranteed to introduce some amount of garbage into your
>sound; all your peaks won't peak at the same point, etc.  Although
>this effect probably won't become really bothersome until you are up
>to about 1/8 of the maximum frequency or so.

Quite accurate, but there will ALWAYS be some garbage of SOME sort
introduced. The integral.fractional notion does a better job of
minimizing it than if you just truncate!!! The main problem introduced
by that method is that you can no longer have a fixed length sample that
is exactly one period long, which nominally forces you to use a non-DMA
method. Except that 1) Phil was making a wish list for something new,
and you could conceivably add hardware features to restart the sample
at the appropriate point beyond the beginning, and 2) my suggestion
about using a DMA channel for period modulation could probably fix this,
too, if that in fact works.

>But it only needs be done once for each sample . . . even a full FFT,
>filter, and reverse FFT doesn't take long for a 256 byte sample.  Not
>real time, true, but for synthesis?  Faster than you can figure parameters
>for the sample.

I guess everyone is assuming you need to do a full FFT, that's why
they have problems with speed. But the type of filtering required doesn't
need a full FFT; it's just a question of averaging.

>Well, I'm trying to learn myself, so I presented what I think I
>understand to be the case, and I hope to be corrected where I am
>wrong.

I hope Phil and Chuck feels the same way, or I'm gonna get flamed! :-)

And finally, Chuck McManis writes:

> The Amigas low pass filters start cutting of 
>frequencies above 7Khz and pretty much eliminate everything above
>14Khz.

That should be 5Khz and 7Khz, respectively. See pg 154-157, Aliasing
Distortion.

>That means that even your golden ear may have difficulty
>in hearing the differences. (You will always get .4% distortion 
>because that is the monotonic difference between to 'points' in 
>space.)

Exactly. Except that again, this is where the trick of adding a
random jitter to the sampling frequency wins, because it distributes
the distortion throughout the audio range, rather than consistently
creating the same error on every tone.

Wow, made it all the way to the end, did you? Must be real interested
in the topic!
	Doug
---
      Doug Merritt        ucbvax!sun.com!cup.portal.com!doug-merritt
                      or  ucbvax!eris!doug (doug@eris.berkeley.edu)
                      or  ucbvax!unisoft!certes!doug