Path: utzoo!attcan!uunet!microsoft!w-colinp
From: w-colinp@microsoft.UUCP (Colin Plumb)
Newsgroups: comp.sys.amiga
Subject: Re: Getting 1.5 MB on a standard 880K floppy without compression
Message-ID: <40@microsoft.UUCP>
Date: 23 Mar 89 14:29:39 GMT
References: <3631@sugar.hackercorp.com> <94323@sun.Eng.Sun.COM>
Reply-To: w-colinp@microsoft.uucp (Colin Plumb)
Organization: very little
Lines: 208

cmcmanis@sun.UUCP (Chuck McManis) wrote:
> In article <3631@sugar.hackercorp.com> (Karl Lehenbauer) writes:
>> In the Roomers column of the March 1989 issue of Amazing Computing, the
>> Bandito says that certain game companies are getting 1.5 MB unformatted
>> on standard 880K floppies.
>> Can anyone verify if this is true and if so, explain how it works?
> 
> [They seem to use RLL.]

RLL isn't quite the answer.  It's doable, but it's more interesting than
that...

On a disk, bits are encoded as magnetic domains.  If you could see the
north poles of the bits of oxide along a track, they'd be pointing
alternately clockwise and counterclockwise.

A spot where the direction of magnetisation changes is a 1; no change
in magnetic field is a 0.  When the read head passes over, it detects
the 1's as a current pulse.  0's have to be figured out by timing.
Since the standard bit time for 3.5" (really, 90mm) floppies is 2us,
if two pulses are detected 2us apart, that was 2 consecutive 1's.  If
the gap was 4us, that was 101, etc.

If the gap is too long, the controller can mis-count and get a wrong
number of 0's.  If the gap is too short, there's a really tiny section
on the disk that's pointing the opposite way to its neighbours.  It
can get swamped and vanish.  Very bad.

At 300 rpm and on the inner tracks (track 79), the minimum time between
1's works out to 4us.  On the outer tracks (track 0), you can achieve
the same linear density by speeding up the bit clock (a la Commodore
PET and C-64), or by slowing down the drive motor (Macintosh).  Or,
like the Amiga, ST, and IBM PC, you can not bother.

Now, I mentioned that the bit clock rate is 2us.  But you can't put
1's closer than 4us apart.  So we have to make sure we always put at
least one 0 between 1's.  Of course, we also have to be careful not
to have too long a string of 0's, either.

(Side note: 1200 bps and up modems have problems with long strings of 0's,
too.  They solve it by XORing in a pseudo-random bit stream which the
modems work out between themselves.  This is referred to as the scrambler
polynomial.)

This is where RLL comes in.  RLL is Run Length Limited.  The first disk
drives used a 4us bit clock, alternating data bits and 1 bits for
synchronisation.  Thus, the longest string of 0's was of length 1.
I.e. runs of 0's were limited to length 1.  But because they could
have adjacent 1's (the lower bound on run lengths was 0), they needed
to use a slow bit clock for this 0,1 RLL scheme.  This is known as
Frequency Modulation.  (It is, in fact, just like FM stereo.  A stream
of 1's would get writtten 1111111111, each 1 standing for a polarity
change.  The signal would end up looking like a square wave of period
8us.  A stream of 0's would get written as 010101010101, a square wave
of half the frequency.)

Then someone worked out a better scheme, MFM (Modified FM), which is still
the most common today, in which 1's were always separated by at least 1
zero, and by at most 3.  Thus, 1,3 RLL.  It's just like FM (data, clock,
data, clock), except that the clock 1's are only inserted between two
consecutive 0's.  This makes it harder to figure out what's clock and
what's data (in FM, if you find a 0, you know it's data), but there
are a few patterns that still are 1,3 RLL but aren't generated by this
scheme.  The standard sync mark is A1, 10100001 which, with clock pulses
inserted (using i and o for 1 and 0 so you can tell the difference) is
1o0o1o0i0i0i0o1o.  By suppressing the second clock pulse (the i after
the third 0), you get the raw bit pattern 100010010001001, which cannot
appear in normal data (try it!).  Because the floppy controller (the
trackdisk.device, in this case) eats the clock pulses, it reads as A1.
But it's really magic.

Because you always have a 0 between adjacent 1's, and you can write 1's
4 us apart, you can write raw bits 2 us apart and stay within the
capabilities of the media.  2us per bit, and 0.2 seconds per revolution
(300 rpm = 5 rps) gives you 100,000 bits per revolution.  80 tracks,
times two sides, gives 16,000,000 bits per disk.  Since it takes two
raw bits to produce one data bit, that's 8,000,000 data bits, or
1,000,000 (not 1 Meg = 1024x1024) bytes per floppy.

There are, however, more complex schemes.  What people refer to as RLL
in the IBM world uses a 2,7 RLL code.  Because you always have 2 0 bits
between 1's, you can use a 1.333 us bit clock and still keep your 1's
far enough apart.  (Actually, it's used for hard drives, which can take
1 bits much closer than 4 us, but the idea is the same.)  The only loss
is that your read circuitry has to be able to keep time for 8 bit periods
as it skips 7 0's.  (Also, your media has to stop the boundary from
"wandering" as much, since you have less tolerance.  But this is usually
pretty easy compared with making it accept 2.666 us gaps between 1's.)

The 2,7 RLL code still manages to encode 1 data bit as 2 "raw" bits,
although it's not nearly as simple as data, clock, data.  Even if
you wanted to go to all the trouble to make the trackdisk.device use
this encoding scheme, it wouldn't help, since you still can't write
raw bits any faster than 2 us.

But you can, however, use a different 1,n RLL scheme.  If you apply a
bit of math, it turns out that 1,3 RLL can give you 0.55146 data bits
per raw bit.  0.5 of that is used by MFM, and a bit extra is used by
the A1 sync pattern.  2,7 RLL gives you 0.51736 data bits per raw
bit.  Agian, 0.5 of that is used for data, and some of the rest is used
for a sync marker.

A 1,6 RLL scheme will give you 0.66903 data bits per raw bit... enough
to spend two thirds of a raw bit encoding data and have a little left
over for a unique sync pattern.  However, wringing the last few drops
of efficiency from an RLL code is hard (you have about 0.00237 bits
to fit your sync marker in here, i.e. it's got to be 1/0.00237 is more
than 40 bits long), so I'd use a 1,7 RLL code, which gives 0.67928 bits
per bit, giving you 0.01262 bits to fit a sync pulse in, resulting in
a more reasonable 8-bit sync pattern to look for.

I still haven't actually come up with an encoding scheme, but I expect that
just playing with it a bit will produce something.

There's also an out I haven't mentioned: perhaps you can push the capabilities
of the disk a little, and write adjacent 1 bits on the outer tracks.  I think
this would put them closer than the disk is rated for, but high-quality
floppies can probably take it.

But playing by the rules, I's use 1,7 RLL.  Now, you're wondering how
I came up with that figure of 0.67928 bits per raw bit.  Here's where
we get into Combinatorics.

I goes like this: the null string is 1,6 RLL.  No bits, no runs of the
wrong length.  Furthermore, and 1,6 RLL string can be followed by any
of 10, 100, 1000, 10000, 100000, or 1000000 and produce a 1,6 RLL
string.

So, given a string of length n, we can produce strings of length n+2, n+3,
n+4, n+5, n+6, and n+7.  These are the only strings we can produce
directly, so we can turn it around: we can make a string of length n
from any string of length n-2, n-3, n-4, n-5, n-6, or n-7.  So S(n), the
number of strings of length n, is S(n-2)+S(n-3)+S(n-4)+S(n-5)+S(n-6)+S(n-7).
And we know that there is one string of length 0, and none of length <0.
S(0) = 1, S(-1) = S(-2) = ... = 0.  From this, you can probably see how
to write a program to compute the number of strings of length n that are
1,6 RLL.  For any p,q RLL, actually.  S(n) = S(n-p)+S(n-p-1)+...+S(n-q-1).

All I did was use really long strings (about a million bits) and take the
log base 2 (log base x of n is log(n)/log(x) for log() in any other base)
of the number of possible strings, which gives the number of bits of
information you can encode in such strings, and divide this by the number
of bits I used (about a million).

A few other numbers:

0,1  RLL: 0.69424	/* FM gets 0.5 of this - inefficient! */
0,2  RLL: 0.87914
0,3  RLL: 0.94677
0,4  RLL: 0.97522
0,5  RLL: 0.98810
0,6  RLL: 0.99419
0,7  RLL: 0.99713
0,8  RLL: 0.99857
0,9  RLL: 0.99929
0,10 RLL: 0.99964

/* 0,infinity RLL is, of course, 1 */

1,1  RLL: 0		/* I assume you can figure out why */
1,2  RLL: 0.40568
1,3  RLL: 0.55146
1,4  RLL: 0.61744
1,5  RLL: 0.65089
1,6  RLL: 0.66903
1,7  RLL: 0.67928
1,8  RLL: 0.68525
1,9  RLL: 0.68878
1,10 RLL: 0.69091

2,3  RLL: 0.28776
2,4  RLL: 0.40568
2,5  RLL: 0.46495
2,6  RLL: 0.49790
2,7  RLL: 0.51736
2,8  RLL: 0.52933
2,9  RLL: 0.53691
2,10 RLL: 0.54179

When doing this computation, you'll quickly overflow 32 bits and even double-
precision floating point.  (After all, a 1 million bit long string in
2,7 RLL has over 2^500,000 possibilities.)  I cobbled up my own pseudo
floating-point, where if the sum S(n) ever exceeeded 1<<24, I divided it by
2 and incremented an exponent word (32 bits, of course).  Adding two numbers
in this form isn't too tricky, expecially since in this computation, you
always know which is the larger (S(n) > S(n-1), after a few initial bobbles),
and you can say:

big += little>>(big_exponent-little_exponent);
if (big > (2<<24)) {
	big >>= 1;
	big_exponent++;
}

I just kept a circular buffer of the number of strings of
length n-1 through n-q-1, and printed out (exponent+log_2(mantissa))/n
every thousand times through the loop.  (If you do it every time,
keep a check for mantissa==0, which is true for length=1 if the minimum
run length > 0.)  If someone else would like to cobble up a routine and
verify my numbers, great.  (If you remember enough combinatorics to get
a closed-form expression for the number of bits, even better.  I tried,
but fell back on numerical approximations.)

So, is everything clear now?  Good.
-- 
	-Colin (uunet!microsoft!w-colinp)

"Don't listen to me.  I never do." - The Doctor