Newsgroups: comp.sys.amiga.programmer
Path: utzoo!utgpu!watserv1!watdragon!rose!ccplumb
From: ccplumb@rose.uwaterloo.ca (Colin Plumb)
Subject: Re: Bizarre Fantasy: RLL Flopies?
Message-ID: <1991Apr4.091810.5359@watdragon.waterloo.edu>
Keywords: MFM, RLL, Floppies, Blitter, Clueless, Fantasy, etc.
Sender: news@watdragon.waterloo.edu (News Owner)
Organization: University of Waterloo
References: <1991Apr2.061633.17980@kessner.denver.co.us>
Date: Thu, 4 Apr 1991 09:18:10 GMT
Lines: 238

chris@kessner.denver.co.us (Chris Hanson [Lord Xenon]) wrote:
> RLL 2,7 typically gives an approximate
> 150% storage space increase, presumably by using only 3 bits to encode
> 2 input bits, instead of MFM's piggish 4 bits out for every 2 bits in.

Nope, 2,7 RLL also encodes 1 "data" bit as 2 "raw" bits.  You want
something I've spent a while thinking about, 1,7 RLL.

>   Now, I know that RLL hard drives use more than just a different type
> of encoding algorithm, as you can not always use a MFM HD with and RLL
> controller, successfully. Am I missing something? Is anyone out there
> familliar with the RLL 2,7 algorithm?

Okay, here's a tutorial on magnetic recording technology.  When you
record something on a disk, the north poles of the ferrite particles
are aligned either with the direction of rotation, or against it.
There is "vertical recording" technology in the labs that points them
up and down, but this is how it works for now.

Now, a "raw" 1 bit is written as a *change* in the direction of
magnetisation, while a raw 0 bit is written as no change.  Thus, a
track is made up of a bunch of little bar magnets of various lengths,
each corresponding to a gap between adjacent 1 bits.

If the gap is too small, the magnet is weak and easily erased.  This is
not good.  Standard floppies, spinning at 300 rpm, are engineered to
use a 4us gap size.  Now, on the outer tracks, this is a lot more
length than on the inner ones, so you can usually get away with a
much smaller (2us) gap and have it work pretty reliably... a number of
copy-protection schemes do this.  But it's not guaranteed to and is Bad
Practice.

On the other hand, while reading the disk, the computer has to measure
the lenths of the gaps to determine how many 0 bits to put in each.
It synchronises with the 1 bits and, if it hasn't received another 1
bit by the time 1.5 bits have rolled around, assumes it's a 0 bit and
goes on.

These two parameters specify minimum and maximum sizes on runs of zeros
between adjacent 1's.  This is what a run-lengh-limited code is all about.
MFM is a (1,3) RLL code.  GCR is a (0,2) RLL code.  There are lots of
others.

Now, these restrictions mean that one raw bit is not completely
arbitrary, so can't carry one complete bit's worth of information.  In
general, the more zero bits that must follow a 1 bit, the less dense a
code is, and the more zero bits that can, the more dense a code is.  A
bit of elementary combinatorics (omitted), comes up with the following
theoretical limits on data bits per raw bit for various RLL
restrictions:

(0,infinity) 1.00000000
(0,2)        0.87914642

(1,infinity) 0.69424191
(1,7)        0.67928626
(1,6)        0.66903164
(1,3)        0.55146308 \
                         > Yes, these two are identical
(2,infinity) 0.55146308 /
(2,7)        0.51736957

Now, The actual densities achieved are .8 for GCR (0,2), .5 for MFM
(1,3), and .5 for (2,7).  Seen in this light, MFM doesn't do so badly..
it's about 90% of optimal.  (2,7) RLL is within 3.5%, while GCR is also
about 90% of optimal.

Now, the first digit (0, 1, 2 in the above) is the shortest run of 0's
allowed.  Adding a 1 to terminate the run should leave at least 4us
between 1 bits, so a (0,n) RLL code must be written a 4us/bit, a
(1,n) RLL code can be written at 2us/bit, and a (2,n) RLL code can be
written at 1.33us/bit.  The ratio (6/4) of bit times between (1,n) RLL
and (2,n) RLL codes is the 50% density increase you get with RLL.

However, as the second number goes up, you have to be more accurate at
counting bit times to make sure you don't read 7 0's as 6 or 8.  There
are all kinds of sources of error in reading magnetic media that conspire
to make it difficult.

The first observation: going to (2,7) RLL lets you use shorter bit times.
Since the Amiga can't write at other than 2 and 4 us/bit, that doesn't
gain us anything.  However, there is a reason I included the (1,6)
and (1,7) entries... since they're (1,n) RLL codes, they are designed
to be written at 2us/bit, but they have densities greater than 2/3,
so we can get 2 data bits per 3 raw bits, i.e. 4 per 6, versus 3 per 6
for MFM.  *This* is a possiblity.

Well, there is one little problem... complexity.  To get close to the
theoretical density is difficult, and while MFM and GCR are fairly
simple, (2,7) RLL is rather more complex, and (1,6) RLL, which is
less than 1% off optimal, is not really practical.  There is, however,
a practical (1,7) RLL code, with density 2/3.  You can flip bits around
and reassign combinations if you like, but it's basically:

00 -> 010
01 -> 001
10 -> 100
11 -> 001

So far, it's a lot like MFM... insert an extra bit between data bits,
which is 1 if and only if they're both zero.  But we don't insert
extra bits between these bit pairs, so we have four possible
problem cases where we'd get adjcent 1's, a no-no:

01.10 -> 001.100
01.11 -> 001.101
11.10 -> 101.100
11.11 -> 101.101

We replace these with special patterns:

01.10 -> 010.000
01.11 -> 001.000
11.10 -> 100.000
11.11 -> 101.000

Since each of these patterns ends in 0, the following bit group cannot
conflict with it.  Proceeding along in this way, we can translate pairs
of bits into triples.  A run of 7 0's is achieved when we have 0.11.10.01.0,
which translates into 0.100.000.001.0.  We can get two consecutive runs of
length 6 (0.11.10.01.10.01.0 -> 0.100.000.010.000.001.0), but the longest
sustained run length we can get is of length 5.

Anyway, armed with this encoding scheme, let's see what we can fit on
an Amiga floppy.  Each track is 60s/300rpm = .2s long, and at 2us/bit,
that's 100,000 bits.   Except the bits are actually 2.5% faster than that,
but may be up to +/- 5% due to genlocking, and the disk may be spinning
at +/- 1.5% of 300 rpm, so the final worst-case track length is, depending
on whether one interprets n% fast as (1+n/100) or (1-n/100), between 95914
and 96236 bits.

Now, using a density-2/3 code, one amiga sector (512 bytes plus 16 bytes
header area = 528 bytes) takes 792 bytes, or 6336 bits.  15 of these are
95040 bits, leaving between 874 and 1194 bits for bookkeeping.
H'm... that's 72.866 to 99.5 bytes... not much.  what can we fit in there?
We need sync marks, patterns that are legal (1,7) RLL but not produced
by our encoding scheme (in MFM, it's 100010010001, which is at the heart
of $4489 = 00100010010001001).  An adjacent 6-run and 7-run will do,
1000000100000001 or 1000000010000001, which conveniently just fit into 
16 bits, letting us use the wordsync register of the Amiga.  But we
can have one sync mark per pair of registers or so if we like.  Let's
call that 8 sync marks per track, for something over 128 bits, or 10 raw
bytes.  What to do with the remaining 62 to 89 bytes?  My idea is ECC.
We should have at least a 32-bit CRC on the track, cutting us down to
52 to 79 bytes.  But a Reed-Solomon error-correcting code can correct
one byte in error out of 257, including the two check bytes.  255 goes
into our 528*15=7920 data bytes 31.0588 times, so we need 32 pairs of
check bytes, which may or may not fit into that available space.  It
depends on whether the available space per track is 100000*.975*.95*.985
or 100000/1.025/1.05/1.015 bits.

Interleaving the Reed-Solomon ECC lets us correct burst errors up to 32
bytes long.  At 24us/byte, that's 768us, or quite a big glitch.

Oh, we also need a way to tell which sector on a track is number 1; we
can let the others follow in numberical order.  Sector number 16 will
be short and contain the 16-byte sector headers, 4 bytes of CRC and 64
bytes of ECC (308 bytes instead of 512).  Well, our sync mark needs to
be padded out with some 0's so it can abut legal patterns... let's
precede it with 0 and follow it with 010 for a normal sector and 000
for the short one.  So that makes 960 bits of overhead of the 874 to
1194 we have room for.  We need a few more bits to detect the track
gap, which I can be sneaky on by using the behaviour of the Amiga's
wordsync register, and I think that's it.

So, the format is basically:
Bits  Total
   18    18  01000000010000001010
12288 12306  1024 bytes of data encoded
   18 12324  01000000010000001010
12288 24612  1024 bytes of data encoded
   18 24630  01000000010000001010
12288 36918  1024 bytes of data encoded
   18 36936  01000000010000001010
12288 49224  1024 bytes of data encoded
   18 49242  01000000010000001010
12288 61530  1024 bytes of data encoded
   18 61548  01000000010000001010
12288 73836  1024 bytes of data encoded
   18 73854  01000000010000001010
12288 86142  1024 bytes of data encoded
   18 86160  01000000010000001000
 9840 96000  512+308 = 820 bytes of data encoded
  
96000 bits is 26.7% of the way through our uncertainty zone, so it's
probably safe.  A few more bits here and there will probably be
needed, but it looks like it (barely!) fits.

Now, what have we achieved?  We've got 15 512-byte sectors por track, per
side, for a total of 2400, or 1200K.  This is the same capacity as
IBM's "1.2 Meg" 5.25" floppies.  But for the agressive, we also have the
header label area (if you don't know what this is, look up ETD_READ in
the RKM's), which is an additional 37.5K, for a grand total of 1237.5K,
8.7K larger than honest 1.2 Meg = 1.2*1024K = 1228.8K.

That should keep people happy for a while.  Now, does anyone want to
*implement* this thing?
-- 
	-Colin

P.S. Reed-Solomon ECC works like this: you have a primitive element in
GF(2^8) (basically, a special kind of math on 8-bit numbers, where
addition is exclusive-or (and therefore the same as subtraction) and
multiplication is rather wierd, but still obeys all the usual rules),
call it p.  "Primitive" means that 1, p, p^2, p^3, etc.  are all
different until you hit p^255 = 1, where it repeats.  (Obviously, 0 is
not one of these numbers, and it hits every other 8-bit pattern)  Given
255 bytes of data, we form the first check byte "a" by xor-ing all the
data together, so if one byte is in error, parity computation will tell
us what the error is, i.e. the bit pattern to be added to (xor-ed with)
one of the bytes to produce the original data.  Then, the second check
byte "b" is data[0]+p*data[1]+p^2*data[2]+...+p^254*data[254].

If the parity byte tells us we have an error, call it "e", then computing
"b" again for the received data, where data[n] is in error, gives us
data[0]+p*data[1]+...+p^n*(data[n]+e)+...+p^254*data[254],
which is the original b plus p^n*e.  Subtracting the two thus gives
p^n*e, and then we can divide by e (since e is non-zero) to get p^n,
and since n is somewhere in the range 0..254, this uniquely identifies
n, and we can exclusive-or data[n] with e to recover the original
message.

If the parity byte itself is in error, check byte b checks out and we
don't correct the data.  If the check byte b is in error then, because
we're assuming no more than one bad byte, the other bytes are all
correct and the parity byte will not report an error.  The 32-bit CRC
serves as a final check to make sure we did all this correctly, since
two or more bytes in error will cause the Reed-Solomon code to
miscorrect something.

This error correction code is perfect in an information-theoretic sense.
We have 16 bits of check data, with 2^16 possible combinations.  When
we recompute the check data at read time, there are 2^16 possibilities
when we xor the old and new together.  Assuning there is no more
than 1 byte in error, there is 1 case in which there is no error, and
255 possible errors in each of 257 possible bytes.  (2^8-1)*(2^8+1) =
2^16-1, plus one for the no-error case gives a unique interpretation for
each error syndrome.