Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!cs.utexas.edu!uwm.edu!csd4.csd.uwm.edu!info-high-audio-request
From: tonyb@juliet.ll.mit.edu
Newsgroups: rec.audio.high-end
Subject: Re: Data Compression
Message-ID: <12205@uwm.edu>
Date: 17 May 91 13:52:16 GMT
Sender: news@uwm.edu
Lines: 59
Approved: tjk@csd4.csd.uwm.edu
Originator: tjk@csd4.csd.uwm.edu

In article <12182@uwm.edu> sethb@fid.Morgan.COM (Seth Breidbart) writes:

>...

>Proof that most files cannot be compressed:

>There are 2**8000000 differend files 1 megabyte long.  There are
>2**7999999-1 differend files with lengths less than 1 megabyte-11 bytes.
>The ratio of these is approximately 0.001.  Therefore, under 0.1% of
>all 1 megabyte files can be compressed by at least 1 byetes (which is
>0.000011% compression).

I'm not going to argue with the arithmetic here, only the logic of applying
it to music.  Human activity (outside of the whitehouse, anyway) is *very*
far from random.

Our puny brains couldn't possibly process all the stimiulus it receives
if there were not strong repeated elements in everything around us.  In
other words, I think its misleading to scare people by showing them math
that implies that a 1 hour CD with 600 megabytes or so of data is coming
from the state-space that includes every possible combination of  600
million random 16 bit words.

Earlier in the discussion the issue of encode/decode 'horsepower' came up.
If we want to assume *lots* of it (especially at the encode end), I think
its reasonable to assume that redundancies in the data could be found and
exploited quite handily.

Lets start with something inbetween a sine-wave test signal and a live
orchestral performance:  say, a solo piano piece played on a MIDI-equipped
keyboard.  The keyboard might have (without compression) 2 or so megabytes
of samples, and a MIDI playback sequence for an hour's worth of piano playing
might be another 2 megabytes (just guessing -- I bet it's smaller).  Add
another couple of kilobytes for parameters to set up an ambience processor,
and you just accomplished 150-to-1 data compression without trying too hard
at all!

Progressing to the "very high horsepower" realm, I can imagine a very smart
encoder listening to a *real* piano performance and sorting the piano notes
out from the hall ambience, and working backwards towards the MIDI file I was
just talking about.  Add a few cough and program-wrinkling samples, and voila,
there's your 150-to-1 compression again.  For that matter, using our currently
available lossless file compression techniques on the MIDI and sample files
would probably up this to 1000:1.

OK, so this is beyond our current state of the art, and it has nothing to do
with the compression algorithms that DCC will use.  But it does have something
to say about how non-random music is.

Back in the world of current technology, even adaptive delta modulation
schemes can deliver significant compression ratios on music without loss
(although apparently not significant enough to get things down to DCC
bandwidth).  That in itself should be a pretty clear indicator that sound we
want to hear comes from well within that set of ".1% of all files" that you 
mentioned.

All this said, I'd still rather have a lossless compression scheme for DCC.

Tony Berke (tonyb@juliet.ll.mit.edu)