Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!wuarchive!zaphod.mps.ohio-state.edu!samsung!noose.ecn.purdue.edu!mentor.cc.purdue.edu!purdue!haven!uvaarpa!mmdf
From: worley@compass.com (Dale Worley)
Newsgroups: comp.lang.perl
Subject: Need help with error correction.
Message-ID: <1991Feb7.185429.18646@uvaarpa.Virginia.EDU>
Date: 7 Feb 91 18:54:29 GMT
Sender: mmdf@uvaarpa.Virginia.EDU (Uvaarpa Mail System)
Reply-To: worley@compass.com
Organization: The Internet
Lines: 25


   When I've had file transmission problems, I've used sum(1) to produce
   a checksum of the file on both the sending side machine and the
   receiving side machine and compared the results.  If they weren't the
   same, then I knew that something got corrupted in the transmission and
   I got the file again.

But you're forgetting that in this application there are so many
errors that one cannot expect that more than a few lines get through
without error.  The probability that the the entire file gets through
without error is infinitesimal, and waiting for it to happen twice
would take forever.

Here's an idea: Break up lines into, say, ten-character lines.  (In
fact, you are using newlines in the file to resynchronize the
line-breaking algorithm.)  The line length should be chosen so that at
least 3/4 of the created lines have no errors in them.  Then apply Gnu
diff or diff3 (for speed) to the resulting files.  Since most of the
ten-character lines get through uncorrupted, diff should be able to
discern how the two files correspond.  Then you can integrate the
output of one or more diffs to reconstruct the file.

Dale Worley		Compass, Inc.			worley@compass.com
--
PHOTOVOLTAICS: safe and clean (but not cheap) electricity from the SUN.