Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!wuarchive!zaphod.mps.ohio-state.edu!samsung!noose.ecn.purdue.edu!mentor.cc.purdue.edu!purdue!haven!uvaarpa!mmdf From: worley@compass.com (Dale Worley) Newsgroups: comp.lang.perl Subject: Need help with error correction. Message-ID: <1991Feb7.185429.18646@uvaarpa.Virginia.EDU> Date: 7 Feb 91 18:54:29 GMT Sender: mmdf@uvaarpa.Virginia.EDU (Uvaarpa Mail System) Reply-To: worley@compass.com Organization: The Internet Lines: 25 When I've had file transmission problems, I've used sum(1) to produce a checksum of the file on both the sending side machine and the receiving side machine and compared the results. If they weren't the same, then I knew that something got corrupted in the transmission and I got the file again. But you're forgetting that in this application there are so many errors that one cannot expect that more than a few lines get through without error. The probability that the the entire file gets through without error is infinitesimal, and waiting for it to happen twice would take forever. Here's an idea: Break up lines into, say, ten-character lines. (In fact, you are using newlines in the file to resynchronize the line-breaking algorithm.) The line length should be chosen so that at least 3/4 of the created lines have no errors in them. Then apply Gnu diff or diff3 (for speed) to the resulting files. Since most of the ten-character lines get through uncorrupted, diff should be able to discern how the two files correspond. Then you can integrate the output of one or more diffs to reconstruct the file. Dale Worley Compass, Inc. worley@compass.com -- PHOTOVOLTAICS: safe and clean (but not cheap) electricity from the SUN.