Path: utzoo!mnetor!uunet!seismo!sundc!pitstop!sun!decwrl!labrea!aurora!eos!ames!necntc!ncoast!allbery From: mark@jhereg.Jhereg.MN.ORG.UUCP (Mark H. Colburn) Newsgroups: comp.sources.misc Subject: v02i063: Yet another version of PD diff Message-ID: <7448@ncoast.UUCP> Date: 29 Feb 88 01:16:11 GMT Sender: allbery@ncoast.UUCP Lines: 2553 Approved: allbery@ncoast.UUCP X-Archive: comp.sources.misc/8802/21 Comp.sources.misc: Volume 2, Issue 63 Submitted-By: "Mark H. Colburn" Archive-Name: pd-diff-v2 Comp.sources.misc: Volume 2, Issue 63 Submitted-By: "Mark H. Colburn" Archive-Name: pd-diff-v2 [Wonderful timing. Anyone want to merge the new patches into the new diff? Sigh. ++bsa] Here is yet another copy of the Public Domain Context Diff program which was first posted in January. I have made some changes to speed it up quite significantly. I have also merged all of the prior changes into the copy, updated the Makefile, README, etc. Since the diffs were quite large, I just thought that I would send you the whole thing. Addition comments are at the tail of the README file. -- Mark H. Colburn mark@jhereg.MN.ORG ihnp4!chinet!jhereg!mark #! /bin/sh # This is a shell archive. Remove anything before this line, then unpack # it by saving it into a file and typing "sh file". To overwrite existing # files, type "sh file -c". You can also feed this as standard input via # unshar, or by typing "sh Makefile <<'END_OF_Makefile' X# X# Makefile for Public Domain Diff X# X# the two 'include' directive are for these USG 5.3 machines (or 3B1) which X# have shared libraries. Comment them out if you do not have shared X# libraries on your system. X Xinclude $(MAKEINC)/Makepre.h X X# X# You may want to use one or more of the following definitions when compiling, X# depending on what your target environment is, and whether you want debugging X# turned on or off. X# X# TURBO - compile for TURBO C target (See TC-READ.ME for details) X# AMIGA - compile for AMIGA target X# OSK - compile for OSK X# DEBUG - turn on debugging (implies TIMING) X# TIMING - turn on timing code X# MSC - Microsoft C 4.0 (small model support) X# X# additional support for vax, or pdp11 are invoked by defining 'vax' X# or 'pdp11'. These should be defined by your preprocessor if you are X# on one of these machines. X# X# CFLAGS = -O -DTURBOC XCFLAGS = -O XLDFLAGS = -s X X# The first line ('cc ...') is for standard C compilers, the second line X# ('$(LD) ...') is for shared library machines. Use whichever line is X# appropriate for your system and comment the other one out. X Xdiff: diff.o X# cc $(LDFLAGS) diff.o -o diff X $(LD) $(SHAREDLIB) $(LDFLAGS) diff.o -o diff X Xclean: X rm -f *.o a.out core diff X Xinclude $(MAKEINC)/Makepost.h END_OF_Makefile if test 1272 -ne `wc -c README <<'END_OF_README' XComp.sources.misc: Volume 2, Issue 1 XArchive-Name: pd-diff XSubmitted-By: blarson@skat.usc.edu (Bob Larson) X XHere's a public domain diff with the -b and -c options. (4.2bsd style Xcontex diffs.) I wasn't aware that these wern't present in all unix Xversions of diff, so I didn't think posting it was a priority. It's Xlarge, slow, and many of the comments are no longer true, but it does Xwork. (except when it runs out of memory) The one case I know of Xwhere its output is incompatable with patch does seem to be pretty Xrare. No makefile is included, the 4.2bsd diff is better on the unix Xsystem I use. If you don't know how to compile and load a single C Xprogram, this probably isn't the tool for you anyway. I'd be grateful Xto anyone who cleans this up and documents it properly. It does Xappear to have been separate files at some point, I'm presenting it in Xa form similar to how I got it: mail headers and outdated documentation Xin comments and all. I just banged on it enough to get it doing what XI wanted. X X--- XComp.Sources.Misc: Volume 2, Issue 8 XSubmitted-By: XArchive-Name: cdiff-v2 X XAfter receiving Bob Larson's sources for the PD context diff program, XI decided to accept his challenge to rewrite the documentation. In Xthe process, I also ported it to TURBOC version 1.5. It probably will Xalso compile in TURBOC 1.0, but since getting the update I dispensed Xwith the previous version and did not try it. X XThe code has been reorganized to strip it of the documentation that Xwas built into it; that has been moved to the file cdiff.mem. Thus, Xthe following shar file includes cdiff.c, cdiff.1 (man source), cdiff.mem X(the previously built-in documentation), cdiff.doc (cdiff.1 passed Xthrough nroff -man for those who do not have nroff available), the Xoriginal README, and a new TC-READ.ME. Follow the notes in TC-READ.ME Xor it will run even slower! X XOf course, no warranties whatsoever go with this. I merely hacked the Xcode minimally. I didn't write it. X--- X XComp.sources.misc: Volume 2, Issue 59 XSubmitted-By: XArchive-Name: pd-cdiff-patch X XNeil Dixon uncovered a flaw in the logic of the cdiff program that Xwas distributed early in January, and which was redistributed with Xchanges to make it compilable in Turbo C. I've tested his patch Xboth on the Unix SysVr2 version and on the PC, and have not found Xany errors. Conversely, the earlier version when compiled in MSC X4.0 (but, for some reason, not when compiled in TC 1.5) would Xsporadically come up with "read" errors. Since it now works in MSC as Xwell as TC, I've included the appropriate ifdefs for both compilers, Xand have incorporated Neil's patch. (This was for clarity. The line Xnumbers in his patch did not correspond precisely to the line numbers Xin the distributed code.) Both the patch as sent to me and the Xrevised code are contained below. X XAs before, I did not write this code. I merely ported it, and of Xcourse make no warranties whatsoever. X X--- X XOk, I guess that I will add my two cents worth. Here is yet another Xrepost of the public domain diff program. X XI have integrated some changes into the i/o portion of the code, providing Xsome significant speedups. These changes were made after spending two Xevenings playging around with the profiler, attempting various fixes to Xmake this beast a little faster. I completed this prior to the latest release Xof the code (the version listed immediately above). X XI have attempted to merge the changes provided by Mike above, but, since I do Xnot have any other machines close by, I could not test them. X XThe changes which I made are in the following areas: X X * modified the fgetss() and fputss() routines. These were the primary X areas of intense activity on the system. From the source that I X could see, these changes should be portable. After timing this X on my 3b1, the changes make this diff run at about the same speed X as the system diff for the files that I was using (amazing isn't it?). X X * Moved the defines from within the source code to within the Makefile. X X * Ran the code through indent. Sorry about that, but it was the only X way that I could make sure that I got all the other patches integrated X properly. X X * Cleaned up some of the comments and added a few of my own. X X * Made a few tweaks to make lint happier. X X * Modified the Makefile to allow use of shared libraries. Included X instructions for all the defines in the system as well. X XMark H. Colburn (mark@jhereg.mn.org) END_OF_README if test 4508 -ne `wc -c TC-READ.ME <<'END_OF_TC-READ.ME' X X1. Make certain that the first line is a #define TURBO 1 line. X2. Compile in the Large model. X3. Set the optimizations: X a) to optimize for speed, not size X b) to use register variables X c) to invoke register optimization X c) to invoke jump optimization X4. Even with the foregoing, this program can run slowly on a large file X (as much as 60 seconds to compare two 40K files in a 4.77 MHz PC!) X Be patient. X END_OF_TC-READ.ME if test 419 -ne `wc -c diff.1 <<'END_OF_diff.1' X.TH DIFF 1 X.SH NAME Xdiff \- Public Domain diff (context diff) program X.SH SYNOPSIS X.B diff X[ X.B \-b \-c \-i \-e X] file1 file2 X.SH DESCRIPTION X.I Diff\^ Xcompares two files, showing what must be changed to make them Xidentical. Either file1 or file2 (but not both) may refer to Xdirectories. If that is the case, a file in the directory whose Xname is the same as the other file argument will be used. The Xstandard input may be used for one of the files by replacing the Xargument by "-". Except for the standard input, both files must Xbe on disk devices. X.SH OPTIONS X.HP X.B \-b XRemove trailing whitespace (blanks and tabs) Xand compress all other strings of whitespace to a single blank. X.HP X.B \-c XPrint some context -- matching lines before Xand after the non-match section. Mark non-matched sections Xwith "|". X.HP X.B \-i XIgnore lower/upper case distinctions. X.HP X.B \-e XOutput is in an "editor script" format which Xis compatible with the Unix 'ed' editor. X.PP XAll information needed to compare the files is maintained in main Xmemory. This means that very large files (or fairly large files Xwith many differences) will cause the program to abort with an X"out of space" message. Main memory requirements (in words) are Xapproximately: X.br X X.ce X2 * (length of file1 + length of file2) X.br X.ce X+ 3 * (number of changes) X.br X.PP X(Where "length" is the number of lines of data in each file.) X.PP XThe algorithm reads each file twice, once to build hash tables Xand once to check for fortuitous matches (two lines that are in Xfact different, but which have the same hash value). CPU time Xrequirements include sorting the hash tables and randomly Xsearching memory tables for equivalence classes. For example, on Xa time-shared VAX-11/780, comparing two 1000 line files required Xabout 30 seconds (elapsed clock time) and about 10,000 bytes of Xworking storage. About 90 per-cent of the time was taken up by Xfile I/O. X.SH DIAGNOSTICS X.HP XWarning, bad option 'x' X.br XThe option is ignored. X.HP XUsage ... X.br XTwo input files were not specified. X.HP XCan't open input file "filename". X.br XCan't continue. X.HP XOut of space X.br XThe program ran out of memory while comparing the two files. X.HP XCan't read line nnn at xxx in file[A/B] X.br XThis indicates an I/O error when seeking to the specific line. XIt should not happen. X.HP XSpurious match, output is not optimal. X.br XTwo lines that were different yielded the same hash value. This is Xharmless except that the difference output is not the minimum set of Xdifferences between the two files. For example, instead of the output: X.ce Xlines 1 to 5 were changed to ... X.br Xthe program will print X.ce Xlines 1 to 3 were changed to ... X.br X.ce Xlines 4 to 5 were changed to ... X.br X.HP XThe program uses a CRC16 hash code. X.br XThe likelihood of this error is Xquite small. X.SH AUTHOR XThe diff algorithm was developed by J. W. Hunt and M. D. McIlroy, Xusing a central algorithm defined by H. S. Stone. XIt was published in: X.in +5 X.nf XHunt, J. W., and McIlroy, M. D., XAn Algorithm for Differential File Comparison, XComputing Science Technical Report #41, XBell Laboratories, Murray Hill, NJ 07974 X.fi X.in -5 X.SH BUGS XOn RSX and DECUS C on VMS systems, diff may fail if the both Xfiles are not "variable-length, implied carriage control" format. XThe scopy program can be used to convert files to this format if Xproblems arise. X.PP XWhen compiled under VAX C, diff handles STREAM_LF files properly X(in addition to the canonical variable-length implied carriage Xcontrol files). Other variations should work, but have not been Xtested. X.PP XWhen compiled under VAX C, diff is quite slow for unknown reasons Xwhich ought to be investigated. On the other hand, it has access Xto effectively unlimited memory. X.PP XOutput in a form suitable for ed - the -e option - seems rather Xpointless; the analogue on DEC systems is SLP (SUMSLP on VMS). It Xwould be simple to provide SLP-compatible output. The question Xis, why bother - since the various DEC file comparison utilities Xalready produce it. END_OF_diff.1 if test 4000 -ne `wc -c diff.c <<'END_OF_diff.c' X/* X * diff.c - public domain context diff program X * X */ X X#ifdef TURBO X#include X#include X#include X#include X#include X#include X X#else /* !TURBO */ X#include X#include X#endif /* TURBO */ X X#ifdef vms X#include X#include X#define IO_SUCCESS (SS | STS) X#define IO_ERROR SS X#endif /* vms */ X X/* X * Note: IO_SUCCESS and IO_ERROR are defined in the Decus C stdio.h file X */ X X#ifndef IO_SUCCESS X#define IO_SUCCESS 0 X#endif /* IO_SUCCESS */ X#ifndef IO_ERROR X#define IO_ERROR 1 X#endif /* IO_ERROR */ X X#define EOS 0 X#ifdef unix Xchar temfile[L_tmpnam]; Xchar *tmpnam(); X#define TEMPFILE (temfile[0]? temfile: (tmpnam(temfile), temfile)) X#else /* unix */ X#define TEMPFILE "diff.tmp" X#endif /* unix */ X#define TRUE 1 X#define FALSE 0 X X#ifdef pdp11 X#define short int X#endif /* pdp11 */ X Xtypedef struct candidate { X int b; /* Line in fileB */ X int a; /* Line in fileA */ X int link; /* Previous candidate */ X} CANDIDATE; X Xtypedef struct line { X unsigned short hash; /* Hash value etc. */ X short serial; /* Line number */ X} LINE; X XLINE *file[2]; /* Hash/line for total file */ X#define fileA file[0] X#define fileB file[1] X XLINE *sfile[2]; /* Hash/line after prefix */ X#define sfileA sfile[0] X#define sfileB sfile[1] X Xint len[2]; /* Actual lines in each file */ X#define lenA len[0] X#define lenB len[1] X Xint slen[2]; /* Squished lengths */ X#define slenA slen[0] X#define slenB slen[1] X Xint prefix; /* Identical lines at start */ Xint suffix; /* Identical lenes at end */ X XFILE *infd[2] = {NULL, NULL}; /* Input file identifiers */ XFILE *tempfd; /* Temp for input redirection */ X Xextern long ftell(); Xextern FILE *fopen(); X X#ifdef TURBO Xextern void *malloc(); X#else /* !TURBO */ Xextern char *malloc(); X#endif /* TURBO */ X Xchar *fgetss(); Xunsigned short hash(); X X#ifdef AMIGA X/* Define these types for Amiga C */ Xchar *savptr; Xint savsiz; Xchar *wrk; Xchar *wrk2; Xint cpysiz; X#endif /* AMIGA */ X X/* X * The following vectors overlay the area defined by fileA X */ X Xshort *class; /* Unsorted line numbers */ Xint *klist; /* Index of element in clist */ XCANDIDATE *clist; /* Storage pool for candidates */ Xint clength = 0; /* Number of active candidates */ X#define CSIZE_INC 50 /* How many to allocate each time we have to */ Xint csize = CSIZE_INC; /* Current size of storage pool */ X Xint *match; /* Longest subsequence */ Xlong *oldseek; /* Seek position in file A */ X X/* X * The following vectors overlay the area defined by fileB X */ X Xshort *member; /* Concatenated equiv. classes */ Xlong *newseek; /* Seek position in file B */ Xchar *textb; /* Input from file2 for check */ X X/* X * Global variables X */ X Xint eflag = FALSE; /* Edit script requested */ Xint bflag = FALSE; /* Blank supress requested */ Xint cflag = FALSE; /* Context printout */ Xint iflag = FALSE; /* Ignore case requested */ Xchar text[257]; /* Input line from file1 */ Xextern char *myalloc(); /* Storage allocator */ X Xextern char *compact(); /* Storage compactor */ X X#ifdef DEBUG X#ifndef OSK X#define TIMING X#endif /* OSK */ X#endif /* DEBUG */ X#ifdef TIMING Xextern long time(); Xextern char *4532mend; Xlong totaltime; Xlong sectiontime; Xchar *mstart; X#endif /* TIMING */ Xvoid free(); Xvoid exit(); X#ifndef OSK Xvoid perror(); X#endif /* OSK */ X X/* X * Diff main program X */ X Xmain(argc, argv) X int argc; X char **argv; X{ X register int i; X register char *ap; X X#ifdef OSK X extern int _memmins; X _memmins = 16 * 1024; /* tell OSK we will malloc a lot */ X#endif /* OSK */ X#ifdef TIMING X sectiontime = time(&totaltime); X#endif /* TIMING */ X#ifdef vms X argc = getredirection(argc, argv); X#endif /* vms */ X while (argc > 1 && *(ap = argv[1]) == '-' && *++ap != EOS) { X while (*ap != EOS) { X switch ((*ap++)) { X case 'b': X bflag++; X break; X X case 'c': X if (*ap > '0' && *ap <= '9') X cflag = *ap++ - '0'; X else X cflag = 3; X break; X X case 'e': X eflag++; X break; X X case 'i': X iflag++; X break; X X default: X fprintf(stderr, X "Warning, bad option '%c'\n", X ap[-1]); X break; X } X } X argc--; X argv++; X } X X if (argc != 3) X error("Usage: diff [-options] file1 file2"); X if (cflag && eflag) { X fprintf(stderr, X "Warning, -c and -e are incompatible, -c supressed.\n"); X cflag = FALSE; X } X argv++; X for (i = 0; i <= 1; i++) { X if (argv[i][0] == '-' && argv[i][1] == EOS) { X infd[i] = stdin; X if ((tempfd = fopen(TEMPFILE, "w")) == NULL) X cant(TEMPFILE, "work", 1); X } else { X infd[i] = fopen(argv[i], "r"); X if (!infd[i]) X cant(argv[i], "input", 2); /* Fatal error */ X } X } X X if (infd[0] == stdin && infd[1] == stdin) X error("Can't diff two things both on standard input."); X X if (infd[0] == NULL && infd[1] == NULL) { X cant(argv[0], "input", 0); X cant(argv[1], "input", 1); X } X#ifdef vms X else if (infd[1] == NULL) X opendir(1, &argv[1], infd[0]); X else if (infd[0] == NULL) X opendir(0, &argv[0], infd[1]); X#endif /* vms */ X X /* X * Read input, building hash tables. X */ X input(0); X input(1); X squish(); X#ifdef DEBUG X printf("before sort\n"); X for (i = 1; i <= slenA; i++) X printf("sfileA[%d] = %6d %06o\n", X i, sfileA[i].serial, sfileA[i].hash); X for (i = 1; i <= slenB; i++) X printf("sfileB[%d] = %6d %06o\n", X i, sfileB[i].serial, sfileB[i].hash); X#endif /* DEBUG */ X sort(sfileA, slenA); X sort(sfileB, slenB); X#ifdef TIMING X ptime("input"); X#endif /* TIMING */ X#ifdef DEBUG X printf("after sort\n"); X for (i = 1; i <= slenA; i++) X printf("sfileA[%d] = %6d %06o\n", X i, sfileA[i].serial, sfileB[i].hash); X for (i = 1; i <= slenB; i++) X printf("sfileB[%d] = %6d %06o\n", X i, sfileB[i].serial, sfileB[i].hash); X#endif /* DEBUG */ X X /* X * Build equivalence classes. X */ X member = (short *) fileB; X equiv(); X member = (short *) compact((char *) member, (slenB + 2) * sizeof(int), X "squeezing member vector"); X X /* X * Reorder equivalence classes into array class[] X */ X class = (short *) fileA; X unsort(); X class = (short *) compact((char *) class, (slenA + 2) * sizeof(int), X "compacting class vector"); X#ifdef TIMING X ptime("equiv/unsort"); X#endif /* TIMING */ X X /* X * Find longest subsequences X */ X klist = (int *) myalloc((slenA + 2) * sizeof(int), "klist"); X clist = (CANDIDATE *) myalloc(csize * sizeof(CANDIDATE), "clist"); X i = subseq(); X#ifndef OSK X free((char *) member); X free((char *) class); X#else /* OSK */ X free((char *) member - sizeof(int)); X free((char *) class - sizeof(int)); X#endif /* OSK */ X match = (int *) myalloc((lenA + 2) * sizeof(int), "match"); X unravel(klist[i]); X#ifndef OSK X free((char *) clist); X free((char *) klist); X#else /* OSK */ X free((char *) clist - sizeof(int)); X free((char *) klist - sizeof(int)); X#endif /* OSK */ X#ifdef TIMING X ptime("subsequence/unravel"); X#endif /* TIMING */ X X /* X * Check for fortuitous matches and output differences X */ X oldseek = (long *) myalloc((lenA + 2) * sizeof(*oldseek), "oldseek"); X newseek = (long *) myalloc((lenB + 2) * sizeof(*newseek), "newseek"); X textb = myalloc(sizeof text, "textbuffer"); X if (check(argv[0], argv[1])) X fprintf(stderr, "Spurious match, output is not optimal\n"); X#ifdef TIMING X ptime("check"); X#endif /* TIMING */ X output(argv[0], argv[1]); X#ifdef TIMING X ptime("output"); X printf("%ld seconds required\n", sectiontime - totaltime); X#endif /* TIMING */ X if (tempfd != NULL) { X fclose(tempfd); X#ifdef unix X unlink(TEMPFILE); X#else /* !unix */ X#ifdef OSK X unlink(TEMPFILE); X#else /* OSK */ X#ifdef MSC /* MSC 4.0 does not understand disjunctive X * #if's. */ X unlink(TEMPFILE); X#else /* MSC */ X remove(TEMPFILE); X#endif /* MSC */ X#endif /* OSK */ X#endif /* unxi */ X } X return(0); X} X X X/* X * Read the file, building hash table X */ X Xinput(which) X int which; /* 0 or 1 to redefine infd[] */ X{ X register LINE *lentry; X register int linect = 0; X FILE *fd; X#define LSIZE_INC 200 /* # of line entries to alloc at once */ X int lsize = LSIZE_INC; X X lentry = (LINE *) myalloc(sizeof(LINE) * (lsize + 3), "line"); X fd = infd[which]; X while (!getline(fd, text)) { X if (++linect >= lsize) { X lsize += 200; X lentry = (LINE *) compact((char *) lentry, X (lsize + 3) * sizeof(LINE), X "extending line vector"); X } X lentry[linect].hash = hash(text); X } X X /* X * If input was from stdin ("-" command), finish off the temp file. X */ X if (fd == stdin) { X fclose(tempfd); X tempfd = infd[which] = fopen(TEMPFILE, "r"); X } X X /* X * If we wanted to be stingy with memory, we could realloc lentry down to X * its exact size (+3 for some odd reason) here. No need? X */ X len[which] = linect; X file[which] = lentry; X} X X X/* X * Look for initial and trailing sequences that have identical hash values. X * Don't bother building them into the candidate vector. X */ X Xsquish() X{ X register int i; X register LINE *ap; X register LINE *bp; X int j; X int k; X X /* X * prefix -> first line (from start) that doesn't hash identically X */ X i = 0; X ap = &fileA[1]; X bp = &fileB[1]; X while (i < lenA && i < lenB && ap->hash == bp->hash) { X i++; X ap++; X bp++; X } X prefix = i; X X /* X * suffix -> first line (from end) that doesn't hash identically X */ X j = lenA - i; X k = lenB - i; X ap = &fileA[lenA]; X bp = &fileB[lenB]; X i = 0; X while (i < j && i < k && ap->hash == bp->hash) { X i++; X ap--; X bp--; X } X suffix = i; X X /* X * Tuck the counts away X */ X for (k = 0; k <= 1; k++) { X sfile[k] = file[k] + prefix; X j = slen[k] = len[k] - prefix - suffix; X X for (i = 0, ap = sfile[k]; i <= slen[k]; i++, ap++) { X ap->serial = i; X } X } X} X X X/* X * Sort hash entries X */ X Xsort(vector, vecsize) X LINE *vector; /* What to sort */ X int vecsize; /* How many to sort */ X{ X register int j; X register LINE *aim; X register LINE *ai; X int mid; X int k; X LINE work; X X for (j = 1; j <= vecsize; j *= 2); X mid = (j - 1); X while ((mid /= 2) != 0) { X k = vecsize - mid; X for (j = 1; j <= k; j++) { X for (ai = &vector[j]; ai > vector; ai -= mid) { X aim = &ai[mid]; X if (aim < ai) X break; /* ?? Why ?? */ X if (aim->hash > ai->hash || X aim->hash == ai->hash && X aim->serial > ai->serial) X break; X work.hash = ai->hash; X ai->hash = aim->hash; X aim->hash = work.hash; X work.serial = ai->serial; X ai->serial = aim->serial; X aim->serial = work.serial; X } X } X } X} X X X/* X * Build equivalence class vector X */ X Xequiv() X{ X register LINE *ap; X union { X LINE *bp; X short *mp; X } r; X register int j; X LINE *atop; X X#ifdef DEBUG X printf("equiv entry\n"); X for (j = 1; j <= slenA; j++) X printf("sfileA[%d] = %6d %06o\n", X j, sfileA[j].serial, sfileA[j].hash); X for (j = 1; j <= slenB; j++) X printf("sfileB[%d] = %6d %06o\n", X j, sfileB[j].serial, sfileB[j].hash); X#endif /* DEBUG */ X j = 1; X ap = &sfileA[1]; X r.bp = &sfileB[1]; X atop = &sfileA[slenA]; X while (ap <= atop && j <= slenB) { X if (ap->hash < r.bp->hash) { X ap->hash = 0; X ap++; X } else if (ap->hash == r.bp->hash) { X ap->hash = j; X ap++; X } else { X r.bp++; X j++; X } X } X while (ap <= atop) { X ap->hash = 0; X ap++; X } X sfileB[slenB + 1].hash = 0; X#ifdef DEBUG X printf("equiv exit\n"); X for (j = 1; j <= slenA; j++) X printf("sfileA[%d] = %6d %06o\n", X j, sfileA[j].serial, sfileA[j].hash); X for (j = 1; j <= slenB; j++) X printf("sfileB[%d] = %6d %06o\n", X j, sfileB[j].serial, sfileB[j].hash); X#endif /* DEBUG */ X ap = &sfileB[0]; X atop = &sfileB[slenB]; X r.mp = &member[0]; X while (++ap <= atop) { X r.mp++; X *r.mp = -(ap->serial); X while (ap[1].hash == ap->hash) { X ap++; X r.mp++; X *r.mp = ap->serial; X } X } X r.mp[1] = -1; X#ifdef DEBUG X for (j = 0; j <= slenB; j++) X printf("member[%d] = %d\n", j, member[j]); X#endif /* DEBUG */ X} X X X/* X * Build class vector X */ X Xunsort() X{ X register int *temp; X register int *tp; X union { X LINE *ap; X short *cp; X } u; X LINE *evec; X short *eclass; X#ifdef DEBUG X int i; X#endif /* DEBUG */ X X temp = (int *) myalloc((slenA + 1) * sizeof(int), "unsort scratch"); X u.ap = &sfileA[1]; X evec = &sfileA[slenA]; X while (u.ap <= evec) { X#ifdef DEBUG X printf("temp[%2d] := %06o\n", u.ap->serial, u.ap->hash); X#endif /* DEBUG */ X temp[u.ap->serial] = u.ap->hash; X u.ap++; X } X X /* X * Copy into class vector and free work space X */ X u.cp = &class[1]; X eclass = &class[slenA]; X tp = &temp[1]; X while (u.cp <= eclass) X *u.cp++ = *tp++; X#ifndef OSK X free((char *) temp); X#else /* OSK */ X free((char *) temp - sizeof(int)); X#endif /* OSK */ X#ifdef DEBUG X printf("unsort exit\n"); X for (i = 1; i <= slenA; i++) X printf("class[%d] = %d %06o\n", i, class[i], class[i]); X#endif /* DEBUG */ X} X X X/* X * Generate maximum common subsequence chain in clist[] X */ X Xsubseq() X{ X int a; X register unsigned ktop; X register int b; X register int s; X unsigned r; X int i; X int cand; X X klist[0] = newcand(0, 0, -1); X klist[1] = newcand(slenA + 1, slenB + 1, -1); X ktop = 1; /* -> guard entry */ X for (a = 1; a <= slenA; a++) { X X /* X * For each non-zero element in fileA ... X */ X if ((i = class[a]) == 0) X continue; X cand = klist[0]; /* No candidate now */ X r = 0; /* Current r-candidate */ X do { X#ifdef DEBUG X printf("a = %d, i = %d, b = %d\n", a, i, member[i]); X#endif /* DEBUG */ X X /* X * Perform the merge algorithm X */ X if ((b = member[i]) < 0) X b = -b; X#ifdef DEBUG X printf("search(%d, %d, %d) -> %d\n", X r, ktop, b, search(r, ktop, b)); X#endif /* DEBUG */ X if ((s = search(r, ktop, b)) != 0) { X if (clist[klist[s]].b > b) { X klist[r] = cand; X r = s; X cand = newcand(a, b, klist[s - 1]); X#ifdef DEBUG X dumpklist(ktop, "klist[s-1]->b > b"); X#endif /* DEBUG */ X } X if (s >= ktop) { X klist[ktop + 1] = klist[ktop]; X ktop++; X#ifdef DEBUG X klist[r] = cand; X dumpklist(ktop, "extend"); X#endif /* DEBUG */ X break; X } X } X } while (member[++i] > 0); X klist[r] = cand; X } X#ifdef DEBUG X printf("last entry = %d\n", ktop - 1); X#endif /* DEBUG */ X return (ktop - 1); /* Last entry found */ X} X X Xint Xnewcand(a, b, pred) X int a; /* Line in fileA */ X int b; /* Line in fileB */ X int pred; /* Link to predecessor, index in cand[] */ X{ X register CANDIDATE *new; X X clength++; X if (++clength >= csize) { X csize += CSIZE_INC; X clist = (CANDIDATE *) compact((char *) clist, X csize * sizeof(CANDIDATE), X "extending clist"); X } X new = &clist[clength - 1]; X new->a = a; X new->b = b; X new->link = pred; X return (clength - 1); X} X X X/* X * Search klist[low..top] (inclusive) for b. If klist[low]->b >= b, X * return zero. Else return s such that klist[s-1]->b < b and X * klist[s]->b >= b. Note that the algorithm presupposes the two X * preset "fence" elements, (0, 0) and (slenA, slenB). X */ X Xsearch(low, high, b) X register unsigned low; X register unsigned high; X register int b; X{ X register int temp; X register unsigned mid; X X if (clist[klist[low]].b >= b) X return (0); X while ((mid = (low + high) / 2) > low) { X if ((temp = clist[klist[mid]].b) > b) X high = mid; X else if (temp < b) X low = mid; X else { X return (mid); X } X } X return (mid + 1); X} X X Xunravel(k) X register int k; X{ X register int i; X register CANDIDATE *cp; X int first_trailer; X int difference; X X first_trailer = lenA - suffix; X difference = lenB - lenA; X#ifdef DEBUG X printf("first trailer = %d, difference = %d\n", X first_trailer, difference); X#endif /* DEBUG */ X for (i = 0; i <= lenA; i++) { X match[i] = (i <= prefix) ? i X : (i > first_trailer) ? i + difference X : 0; X } X#ifdef DEBUG X printf("unravel\n"); X#endif /* DEBUG */ X while (k != -1) { X cp = &clist[k]; X#ifdef DEBUG X if (k < 0 || k >= clength) X error("Illegal link -> %d", k); X printf("match[%d] := %d\n", cp->a + prefix, cp->b + prefix); X#endif /* DEBUG */ X match[cp->a + prefix] = cp->b + prefix; X k = cp->link; X } X} X X X/* X * Check for hash matches (jackpots) and collect random access indices to X * the two files. X * X * It should be possible to avoid doing most of the ftell's by noticing X * that we are not doing a context diff and noticing that if a line X * compares equal to the other file, we will not ever need to know its X * file position. FIXME. X */ X Xcheck(fileAname, fileBname) X char *fileAname; X char *fileBname; X{ X register int a; /* Current line in file A */ X register int b; /* Current line in file B */ X int jackpot; X X/* X * The VAX C ftell() returns the address of the CURRENT record, not the X * next one (as in DECUS C or, effectively, other C's). Hence, the values X * are "off by one" in the array. OFFSET compensates for this. X */ X X#ifdef vms X#define OFFSET (-1) X#else /* !vms */ X#define OFFSET 0 X#endif /* vms */ X X b = 1; X rewind(infd[0]); X rewind(infd[1]); X/* X * See above; these would be over-written on VMS anyway. X */ X X#ifndef vms X oldseek[0] = ftell(infd[0]); X newseek[0] = ftell(infd[1]); X#endif /* vms */ X X jackpot = 0; X#ifdef DEBUG X printf("match vector\n"); X for (a = 0; a <= lenA; a++) X printf("match[%d] = %d\n", a, match[a]); X#endif /* DEBUG */ X for (a = 1; a <= lenA; a++) { X if (match[a] == 0) { X /* Unique line in A */ X oldseek[a + OFFSET] = ftell(infd[0]); X getline(infd[0], text); X continue; X } X while (b < match[a]) { X /* Skip over unique lines in B */ X newseek[b + OFFSET] = ftell(infd[1]); X getline(infd[1], textb); X b++; X } X X /* X * Compare the two, supposedly matching, lines. Unless we are going X * to print these lines, don't bother to remember where they are. We X * only print matching lines if a context diff is happening, or if a X * jackpot occurs. X */ X if (cflag) { X oldseek[a + OFFSET] = ftell(infd[0]); X newseek[b + OFFSET] = ftell(infd[1]); X } X getline(infd[0], text); X getline(infd[1], textb); X if (!streq(text, textb)) { X fprintf(stderr, "Spurious match:\n"); X fprintf(stderr, "line %d in %s, \"%s\"\n", X a, fileAname, text); X fprintf(stderr, "line %d in %s, \"%s\"\n", X b, fileBname, textb); X match[a] = 0; X jackpot++; X } X b++; X } X for (; b <= lenB; b++) { X newseek[b + OFFSET] = ftell(infd[1]); X getline(infd[1], textb); X } X/* X * The logical converse to the code up above, for NON-VMS systems, to X * store away an fseek() pointer at the beginning of the file. For VMS, X * we need one at EOF... X */ X X#ifdef vms X oldseek[lenA] = ftell(infd[0]); X getline(infd[0], text); /* Will hit EOF... */ X newseek[lenB] = ftell(infd[1]); X getline(infd[1], textb); /* Will hit EOF... */ X#endif /* vms */ X X return (jackpot); X} X X Xoutput(fileAname, fileBname) X char *fileAname, *fileBname; X{ X register int astart; X register int aend = 0; X int bstart; X register int bend; X X rewind(infd[0]); X rewind(infd[1]); X match[0] = 0; X match[lenA + 1] = lenB + 1; X if (!eflag) { X if (cflag) { X X /* X * Should include ctime style dates after the file names, but X * this would be non-trivial on OSK. Perhaps there should be a X * special case for stdin. X */ X printf("*** %s\n--- %s\n", fileAname, fileBname); X } X X /* X * Normal printout X */ X for (astart = 1; astart <= lenA; astart = aend + 1) { X X /* X * New subsequence, skip over matching stuff X */ X while (astart <= lenA X && match[astart] == (match[astart - 1] + 1)) X astart++; X X /* X * Found a difference, setup range and print it X */ X bstart = match[astart - 1] + 1; X aend = astart - 1; X while (aend < lenA && match[aend + 1] == 0) X aend++; X bend = match[aend + 1] - 1; X match[aend] = bend; X change(astart, aend, bstart, bend); X } X } else { X X /* X * Edit script output -- differences are output "backwards" for the X * benefit of a line-oriented editor. X */ X for (aend = lenA; aend >= 1; aend = astart - 1) { X while (aend >= 1 X && match[aend] == (match[aend + 1] - 1) X && match[aend] != 0) X aend--; X bend = match[aend + 1] - 1; X astart = aend + 1; X while (astart > 1 && match[astart - 1] == 0) X astart--; X bstart = match[astart - 1] + 1; X match[astart] = bstart; X change(astart, aend, bstart, bend); X } X } X if (lenA == 0) X change(1, 0, 1, lenB); X} X X X/* X * Output a change entry: fileA[astart..aend] changed to fileB[bstart..bend] X */ X Xchange(astart, aend, bstart, bend) X int astart; X int aend; X int bstart; X int bend; X{ X char c; X X /* X * This catches a "dummy" last entry X */ X if (astart > aend && bstart > bend) X return; X c = (astart > aend) ? 'a' : (bstart > bend) ? 'd' : 'c'; X if (cflag) X fputs("**************\n*** ", stdout); X X if (c == 'a' && !cflag) X range(astart - 1, astart - 1, 0); /* Addition: just print one X * odd # */ X else X range(astart, aend, 0); /* Print both, if different */ X if (!cflag) { X putchar(c); X if (!eflag) { X if (c == 'd') X range(bstart - 1, bstart - 1, 1); /* Deletion: just print X * one odd # */ X else X range(bstart, bend, 1); /* Print both, if different */ X } X } X putchar('\n'); X if ((!eflag && c != 'a') || cflag) { X fetch(oldseek, astart, aend, lenA, infd[0], X cflag ? (c == 'd' ? "- " : "! ") : "< "); X if (cflag) { X fputs("--- ", stdout); X range(bstart, bend, 1); X fputs(" -----\n", stdout); X } else if (astart <= aend && bstart <= bend) X printf("---\n"); X } X fetch(newseek, bstart, bend, lenB, infd[1], X cflag ? (c == 'a' ? "+ " : "! ") : (eflag ? "" : "> ")); X if (eflag && bstart <= bend) X printf(".\n"); X} X X X/* X * Print a range X */ X Xrange(from, to, w) X int from; X int to; X int w; X{ X if (cflag) { X if ((from -= cflag) <= 0) X from = 1; X if ((to += cflag) > len[w]) X to = len[w]; X } X if (to > from) { X printf("%d,%d", from, to); X } else if (to < from) { X printf("%d,%d", to, from); X } else { X printf("%d", from); X } X} X X X/* X * Print the appropriate text X */ X Xfetch(seekvec, start, end, trueend, fd, pfx) X long *seekvec; X register int start; X register int end; X int trueend; X FILE *fd; X char *pfx; X{ X register int i; X register int first; X register int last; X X if (cflag) { X if ((first = start - cflag) <= 0) X first = 1; X if ((last = end + cflag) > trueend) X last = trueend; X } else { X first = start; X last = end; X } X if (fseek(fd, seekvec[first], 0) != 0) { X printf("?Can't read line %d at %08lx (hex) in file%c\n", X start, seekvec[first], X (fd == infd[0]) ? 'A' : 'B'); X } else { X for (i = first; i <= last; i++) { X if (fgetss(text, sizeof text, fd) == NULL) { X printf("** Unexpected end of file\n"); X break; X } X#ifdef DEBUG X printf("%5d: %s%s\n", i, pfx, text); X#else /* !DEBUG */ X fputs((cflag && (i < start || i > end)) ? " " : pfx, stdout); X fputs(text, stdout); X putchar('\n'); X#endif /* DEBUG */ X } X } X} X X X/* X * Input routine, read one line to buffer[], return TRUE on eof, else FALSE. X * The terminating newline is always removed. If "-b" was given, trailing X * whitespace (blanks and tabs) are removed and strings of blanks and X * tabs are replaced by a single blank. Getline() does all hacking for X * redirected input files. X */ X Xint Xgetline(fd, buffer) X FILE *fd; X char *buffer; X{ X register char *top; X register char *fromp; X register char c; X X if (fgetss(buffer, sizeof text, fd) == NULL) { X *buffer = EOS; X return (TRUE); X } X if (fd == stdin) X fputss(buffer, tempfd); X if (bflag || iflag) { X top = buffer; X fromp = buffer; X while ((c = *fromp++) != EOS) { X if (bflag && (c == ' ' || c == '\t')) { X c = ' '; X while (*fromp == ' ' || *fromp == '\t') X fromp++; X } X if (iflag) X c = tolower(c); X *top++ = c; X } X if (bflag && top[-1] == ' ') X top--; X *top = EOS; X } X return (FALSE); X} X X Xstatic unsigned short crc16a[] = { X 0000000, 0140301, 0140601, 0000500, X 0141401, 0001700, 0001200, 0141101, X 0143001, 0003300, 0003600, 0143501, X 0002400, 0142701, 0142201, 0002100, X}; X Xstatic unsigned short crc16b[] = { X 0000000, 0146001, 0154001, 0012000, X 0170001, 0036000, 0024000, 0162001, X 0120001, 0066000, 0074000, 0132001, X 0050000, 0116001, 0104001, 0043000, X}; X X X/* X * Return the CRC16 hash code for the buffer X * Algorithm from Stu Wecker (Digital memo 130-959-002-00). X */ X Xunsigned short Xhash(buffer) X char *buffer; X{ X register unsigned short crc; X register char *tp; X register short temp; X X crc = 0; X for (tp = buffer; *tp != EOS;) { X temp = *tp++ ^ crc; /* XOR crc with new char */ X crc = (crc >> 8) X ^ crc16a[(temp & 0017)] X ^ crc16b[(temp & 0360) >> 4]; X } X#ifdef DEBUG_ALL X printf("%06o: %s\n", (crc == 0) ? 1 : crc, buffer); X#endif /* DEBUG_ALL */ X return ((crc == 0) ? (unsigned short) 1 : crc); X} X X X#ifdef vms Xopendir(which, arg, okfd) X int which; /* Which file to open (0 or 1) */ X char **arg; /* File name argument, &argv[which] */ X FILE *okfd; /* File name (already open) */ X{ X register char *tp; X register int c; X register char *newname; X X fgetname(okfd, text); X X /* X * Skip over device name X */ X for (tp = text; (c = *tp) && c != ':'; tp++); X if (c) X tp++; X else X tp = text; X X /* X * Skip over [UIC] or [PPN] if present X */ X if (*tp == '[' || *tp == '(') { X while ((c = *tp++) && c != ']' && c != ')'); X if (c == 0) { X fprintf(stderr, "?Bug: bad file name \"%s\"\n", X text); X tp--; X } X } X strcpy(text, tp); X X /* X * Don't include version X */ X for (tp = text; (c = *tp) && c != ';'; tp++); X *tp = 0; X X /* X * Now, text has the file name, tp - text, its length, and *arg the X * (possible) directory name. Create a new file name for opening. X */ X#ifndef OSK X if ((newname = malloc(tp - text + strlen(*arg) + 1)) == NULL) X error("Out of space at start"); X#ifdef AMIGA X savsiz = tp - text + strlen(*arg) + 1; X savptr = newname; X#endif /* AMIGA */ X#else /* OSK */ X newname = myalloc(tp - text + strlen(*arg) + 1, "Out of space at start"); X#endif /* OSK */ X concat(newname, *arg, text, NULL); X if ((infd[which] = fopen(newname, "r")) == NULL) X cant(*arg, "constructed input", 1); X else X *arg = newname; X} X/* Amiga C doesn't have all these extensions for directory... */ X#endif /* vms */ X X X/* X * Allocate or crash. X */ X Xchar * Xmyalloc(amount, why) X unsigned amount; X char *why; X{ X register char *pointer; X X#ifdef OSK X amount += sizeof(int); X#endif /* OSK */ X if ((pointer = malloc((unsigned) amount)) == NULL) X noroom(why); X#ifdef OSK X *((int *) pointer) = amount; X pointer += sizeof(int); X#ifdef DEBUG X fprintf(stderr, "Myalloc: %d at %06o\n", amount, pointer); X#endif /* DEBUG */ X#endif /* OSK */ X#ifdef AMIGA X savsiz = amount; X savptr = pointer; X#endif /* AMIGA */ X X return (pointer); X} X X X/* X * Reallocate pointer, compacting storage X * X * The "compacting storage" part is probably not relevant any more. X * There used to be horrid code here that malloc'd one byte and freed X * it at magic times, to cause garbage collection of the freespace X * or something. It's safely gone now, you didn't have to see it. X * -- John Gilmore, Nebula Consultants, Sept 26, 1986 X */ X Xchar * Xcompact(pointer, new_amount, why) X char *pointer; X unsigned new_amount; X char *why; X{ X register char *new_pointer; X X#ifndef AMIGA X#ifndef OSK X#ifdef TURBO X extern void *realloc(); X#else /* !TURBO */ X extern char *realloc(); X#endif /* TURBO */ X X if ((new_pointer = realloc(pointer, (unsigned) new_amount)) == NULL) { X noroom(why); X } X#else /* OSK */ X register int old_amount; X new_amount += sizeof(int); X if ((new_pointer = malloc(new_amount)) == NULL) X noroom(why); X *(int *) new_pointer = new_amount; X new_pointer += sizeof(int); X old_amount = *(((int *) pointer) - 1); X /* _strass is like bcopy with the first two arguments reversted */ X _strass(new_pointer, pointer, (new_amount <= old_amount ? X new_amount : old_amount) - sizeof(int)); X#ifdef DEBUG X fprintf(stderr, "compact %d to %d from %06o to %06o\n", X old_amount, new_amount, pointer, new_pointer); X#endif /* DEBUG */ X free(pointer - sizeof(int)); X#endif /* OSK */ X#else /* AMIGA */ X X /* X * This routine is heavily dependent on C storage allocator hacks For X * Amiga, we can't rely on storage being left alone "up to" the boundary X * of allocation as in VMS or RSX. Therefore we have to be different here X * and allocate a new larger segment, then free the old one. Messy but X * hopefully it will work. X */ X extern char *malloc(); X X /* No realloc(). Do a malloc and copy it. */ X if ((new_pointer = malloc((unsigned) new_amount)) == NULL) { X noroom(why); X } X /* This MUST assume the program calls compact using the old pointer as the X last call of malloc... Reason is that RSX version is really simpleminded */ X cpysiz = savsiz; X /* Complain if deallocate area not same as last allocate area */ X if (savptr != pointer) X bogus(why); X wrk2 = new_pointer; X for (wrk = pointer; cpysiz > 0; cpysiz--) { X /* copy data to new area */ X *wrk2++ = *wrk++; X } X /* when done, free old memory area. */ X free(pointer); X#endif /* AMIGA */ X X#ifndef OSK X#ifdef DEBUG X if (new_pointer != pointer) { X fprintf(stderr, "moved from %06o to %06o\n", X pointer, new_pointer); X } X/* rdump(new_pointer, why); X*/ X#endif /* DEBUG */ X#endif /* OSK */ X return (new_pointer); X} X X Xnoroom(why) X char *why; X{ X fprintf(stderr, "?DIFF-F-out of room when %s\n", why); X exit(IO_ERROR); X} X X X#ifdef AMIGA Xbogus(why) X char *why; X{ X fprintf(stderr, "?DIFF-F-invalid compaction when %s\n", why); X exit(IO_ERROR); X} X#endif /* AMIGA */ X X X#ifdef DEBUG X/* X * Dump memory block X */ X Xrdump(pointer, why) X int *pointer; X char *why; X{ X int *last; X int count; X X last = ((int **) pointer)[-1]; X fprintf(stderr, "dump %s of %06o -> %06o, %d words", X why, pointer, last, last - pointer); X last = (int *) (((int) last) & ~1); X for (count = 0; pointer < last; ++count) { X if ((count & 07) == 0) { X fprintf(stderr, "\n%06o", pointer); X } X fprintf(stderr, "\t%06o", *pointer); X pointer++; X } X fprintf(stderr, "\n"); X} X#endif /* DEBUG */ X X X/* X * Can't open file message X */ X Xcant(filename, what, fatalflag) X char *filename; X char *what; X int fatalflag; X{ X fprintf(stderr, "Can't open %s file \"%s\": ", what, filename); X#ifndef OSK X perror((char *) NULL); X#else X prerr(0, errno); X#endif X if (fatalflag) { X exit(fatalflag); X } X} X X X#ifdef DEBUG Xdump(d_linep, d_len, d_which) X LINE *d_linep; X int d_len; X int d_which; X{ X register int i; X X printf("Dump of file%c, %d elements\n", "AB"[d_which], d_len); X printf("linep @ %06o\n", d_linep); X for (i = 0; i <= d_len; i++) { X printf("%3d %6d %06o\n", i, X d_linep[i].serial, d_linep[i].hash); X } X} X X X/* X * Dump klist X */ X Xdumpklist(kmax, why) X int kmax; X char *why; X{ X register int i; X register CANDIDATE *cp; X register int count; X X printf("\nklist[0..%d] %s, clength = %d\n", kmax, why, clength); X for (i = 0; i <= kmax; i++) { X cp = &clist[klist[i]]; X printf("%2d %2d", i, klist[i]); X if (cp >= &clist[0] && cp < &clist[clength]) X printf(" (%2d %2d -> %2d)\n", cp->a, cp->b, cp->link); X else if (klist[i] == -1) X printf(" End of chain\n"); X else X printf(" illegal klist element\n"); X } X for (i = 0; i <= kmax; i++) { X count = -1; X for (cp = (CANDIDATE *) klist[i]; cp > &clist[0]; X cp = (CANDIDATE *) & cp->link) { X if (++count >= 6) { X printf("\n "); X count = 0; X } X printf(" (%2d: %2d,%2d -> %d)", X cp - clist, cp->a, cp->b, cp->link); X } X printf("\n"); X } X printf("*\n"); X} X#endif /* DEBUG */ X X X X#ifdef TIMING X X/* X * Dump time buffer X */ X Xptime(why) X char *why; X{ X long ttemp; X X ttemp = time(NULL); X printf("%ld seconds for %s\n", X ttemp - sectiontime, why); X sectiontime = ttemp; X} X#endif /* TIMING */ X X X/* X * TRUE if strings are identical X */ X Xint Xstreq(s1, s2) X register char *s1; X register char *s2; X{ X while (*s1++ == *s2) { X if (*s2++ == EOS) X return (TRUE); X } X return (FALSE); X} X X X/* X * Error message before retiring. X */ X X/* VARARGS */ Xerror(format, args) X char *format; X{ X fprintf(stderr, format, &args); X putc('\n', stderr); X _error(); X} X X X_error() X{ X exit(1); X} X X X/* X * Like fput() except that it puts a newline at the end of the line. X */ X Xfputss(s, iop) X register char *s; X register FILE *iop; X{ X fputs(s, iop); X putc('\n', iop); X} X X X/* X * Fgetss() is like fgets() except that the terminating newline X * is removed. X */ X Xchar * Xfgetss(s, n, iop) X char *s; X register FILE *iop; X{ X register char *cs; X X if (fgets(s, n, iop) == NULL) X return ((char *) NULL); X cs = s + strlen(s) - 1; X if (*cs == '\n') X *cs = '\0'; X return (s); X} END_OF_diff.c if test 33595 -ne `wc -c diff.doc <<'END_OF_diff.doc' X X X X CCCCDDDDIIIIFFFFFFFF((((1111)))) UUUUNNNNIIIIXXXX 5555....0000 CCCCDDDDIIIIFFFFFFFF((((1111)))) X X X X NNNNAAAAMMMMEEEE X cdiff - Public Domain cdiff (context diff) program X X SSSSYYYYNNNNOOOOPPPPSSSSIIIISSSS X ddddiiiiffffffff [ ----bbbb ----cccc ----iiii ----eeee ] file1 file2 X X DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN X _D_i_f_f compares two files, showing what must be changed to X make them identical. Either file1 or file2 (but not both) X may refer to directories. If that is the case, a file in the X directory whose name is the same as the other file argument X will be used. The standard input may be used for one of the X files by replacing the argument by "-". Except for the X standard input, both files must be on disk devices. X X OOOOPPPPTTTTIIIIOOOONNNNSSSS X ----bbbb Remove trailing whitespace (blanks and tabs) and compress X all other strings of whitespace to a single blank. X X ----cccc Print some context -- matching lines before and after the X non-match section. Mark non-matched sections with "|". X X ----iiii Ignore lower/upper case distinctions. X X ----eeee Output is in an "editor script" format which is X compatible with the Unix 'ed' editor. X X All information needed to compare the files is maintained in X main memory. This means that very large files (or fairly X large files with many differences) will cause the program to X abort with an "out of space" message. Main memory X requirements (in words) are approximately: X X 2 * (length of file1 + length of file2) X + 3 * (number of changes) X X (Where "length" is the number of lines of data in each X file.) X X The algorithm reads each file twice, once to build hash X tables and once to check for fortuitous matches (two lines X that are in fact different, but which have the same hash X value). CPU time requirements include sorting the hash X tables and randomly searching memory tables for equivalence X classes. For example, on a time-shared VAX-11/780, comparing X two 1000 line files required about 30 seconds (elapsed clock X time) and about 10,000 bytes of working storage. About 90 X per-cent of the time was taken up by file I/O. X X DDDDIIIIAAAAGGGGNNNNOOOOSSSSTTTTIIIICCCCSSSS X Warning, bad option 'x' X The option is ignored. X X X X Page 1 (printed 1/13/88) X X X X X X X CCCCDDDDIIIIFFFFFFFF((((1111)))) UUUUNNNNIIIIXXXX 5555....0000 CCCCDDDDIIIIFFFFFFFF((((1111)))) X X X X Usage ... X Two input files were not specified. X X Can't open input file "filename". X Can't continue. X X Out of space X The program ran out of memory while comparing the two X files. X X Can't read line nnn at xxx in file[A/B] X This indicates an I/O error when seeking to the X specific line. It should not happen. X X Spurious match, output is not optimal. X Two lines that were different yielded the same hash X value. This is harmless except that the difference X output is not the minimum set of differences between X the two files. For example, instead of the output: X lines 1 to 5 were changed to ... X the program will print X lines 1 to 3 were changed to ... X lines 4 to 5 were changed to ... X X The program uses a CRC16 hash code. X The likelihood of this error is quite small. X X AAAAUUUUTTTTHHHHOOOORRRR X The diff algorithm was developed by J. W. Hunt and M. D. X McIlroy, using a central algorithm defined by H. S. Stone. X It was published in: X Hunt, J. W., and McIlroy, M. D., X An Algorithm for Differential File Comparison, X Computing Science Technical Report #41, X Bell Laboratories, Murray Hill, NJ 07974 X X BBBBUUUUGGGGSSSS X On RSX and DECUS C on VMS systems, diff may fail if the both X files are not "variable-length, implied carriage control" X format. The scopy program can be used to convert files to X this format if problems arise. X X When compiled under VAX C, diff handles STREAM_LF files X properly (in addition to the canonical variable-length X implied carriage control files). Other variations should X work, but have not been tested. X X When compiled under VAX C, diff is quite slow for unknown X reasons which ought to be investigated. On the other hand, X it has access to effectively unlimited memory. X X Output in a form suitable for ed - the -e option - seems X X X X Page 2 (printed 1/13/88) X X X X X X X CCCCDDDDIIIIFFFFFFFF((((1111)))) UUUUNNNNIIIIXXXX 5555....0000 CCCCDDDDIIIIFFFFFFFF((((1111)))) X X X X rather pointless; the analogue on DEC systems is SLP (SUMSLP X on VMS). It would be simple to provide SLP-compatible X output. The question is, why bother - since the various DEC X file comparisonFound 424 control chars in "diff.doc" utilities already produce it. X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Page 3 (printed 1/13/88) X X X END_OF_diff.doc echo shar: 424 control characters may be missing from \"diff.doc\" if test 6151 -ne `wc -c diff.mem <<'END_OF_diff.mem' X X XJan 13 13:00 1988 CDIFF.MEM Page 1 X X X X Diff maintains all information needed to compare the two files in X main memory. This means that very large files (or fairly large X files with many differences) will cause the program to abort with X an "out of space" error. Main memory requirements (in words) are X approximately: X X 2 * (length of file1 + length of file2) + (3 * number of changes) X X The diff algorithm reads each file twice (once to build hash X tables and a second time to check for fortuitous matches), then X reads the differences by seeking randomly within the files. CPU X time requirements include sorting the two hash vectors and X randomly searching memory tables for equivalence classes. For X example, running in Vax compatibility mode, two 1000 line files X with a fair number of differences took about 25 seconds (elapsed X wall clock time) for processing. Most of this time was spent in X the file read routines. This test required slightly more than X 6000 words of memory for internal tables. X X The diff algorithm was developed by J. W. Hunt and M. D. McIlroy, X using a central algorithm defined by H. S. Stone. The algorithm X was described in: X X Hunt, J. W., and McIlroy, M. D., X An Algorithm for Differential File Comparison, X Computing Science Technical Report #41, X Bell Laboratories, Murray Hill, NJ 07974 X X The following description is summarized from that document. While X it has been slightly modified to correspond to the program X source, the algorithm is essentially identical. X X 1. Read the input files, building two vectors containing the line X number (serial) and hash value (hash) of each line. Data for X fileA will be in a vector pointed to by fileA[], while data for X fileB will be pointed to by fileB[]. The lengths (number of X lines) of the files will be represented by lenA and lenB X respectively. [This is slightly different from the published X algorithm.] X X 2. Note initial and final sequences that have identical hash X values to shorten subsequent processing. Note that the "jackpot" X phase (step 9.) will examine all lines in the file. Next, sort X each file using hash as the primary key and serial as the X secondary key. X X 3. Construct an array of equivalence classes (member[1..lenB]) X where each element contains the line number in fileB and a flag X which is True if the indicated line is the first member of an X equivalence class. (An equivalence class is a set of lines which X all hash to the same value. The lines themselves are not X necessarily identical.) X X 4. Construct an array, class[1..lenA], where each element, I, is X set to the index of a line, J, in fileB if member[J] is the first X X X X X X X XJan 13 13:00 1988 CDIFF.MEM Page 2 X X X element in an equivalence class and the hash code of line[I] in X fileA is the same as the hash code of line[J] in fileB. Class[I] X is set to zero if no such line exists. X X If non-zero, class[I] now points in member[] to the beginning of X the class of lines in fileB equivalent to line[I] in fileA. X X The next two steps implement the longest common subsequence X algorithm. X X 5. Structure CANDIDATE { a, b, previous }, where a and b are line X numbers and previous a reference to a candidate, will store X candidate lists as they are constructed. X X Vector clist[] stores references to candidates. It is dimensioned X (0..min(lenA, lenB) + 1) X X Initialize X clist[0] = CANDIDATE { 0, 0, -1 }; X clist[1] = CANDIDATE { A+1, B+1, -1 }; X ktop = 1; X X clist[1] is a fence beyond the last usefully filled element and X -1 is an out-of-range clist index. Ktop is the index of the X fence. Note, because of the memory allocation used, clist[] is X actually composed of two vectors, clist[] containing the X candidate reference, and klist[] containing pointers to clist. X X 6. For (A = 1 to lenA) { X I = class[A]; -- the index in member[]: beginning of X -- the class of lines in fileB equivalent X -- to this line in fileA. X if (I is non-zero) { X Merge each member into the candidate list X as discussed below. X } X X Unravel the chain of candidates, getting a vector of common X subsequences: X X 7. Set all elements of match[0..lenA] to zero. X X 8. clist[ktop-1] points to the subsequence chain head. For each X element in the chain, let A and B be the line number entries. X Then, set X X match[A] = B; X X The non-zero elements of match[] now pick out a longest common X subsequence chain, possibly including spurious matches due to X hash coincidences. The pairings between the two files are: X X if (match[A] is non-zero) { X line A in fileA matches line match[A] in fileB; X } X X X X X X X X XJan 13 13:00 1988 CDIFF.MEM Page 3 X X X Now, read each line of fileA and fileB to check for jackpots: X X 9. for (A = 1 to lenA) { X if (match[A] is nonzero) { X if (fileA[A] is not identical to fileB[match[A]]) X match[A] = 0; -- Hash congruence X } X } X X Ignoring "squish" complications, the merge step may be defined as X follows: X X Entry: X clist[] Candidate pointer array X ktop Fence beyond last filled index X A Current index in fileA X member[] Equivalence vector X I The index in member[] of the first element X of the class of lines in fileB that are X equivalent to line[A] in fileA. X X 1. Let clist[R] be "an r-candidate" and C a reference to the last X candidate found, which will always be an r-candidate. clist[R] X will be updated with this reference once the previous value of X clist[R] is no longer needed. Initialize: X X R = 0; C = clist[0]; X X 2. Do steps 3 through 6 repeatedly: X X 3. set B to the line number in member[I]; search clist[R..ktop] X for an element, clist[S], such that X X clist[S-1].b < B and clist[S].b >= B X X Note that clist[] is ordered on clist[].b so that binary search X will work. The search algorithm used requires the two "fence" X entries described above. X X If such an element is found, perform steps 4. and 5. X X 4. Extend the longest common subsequence chain: X X If (clist[S].b > B) { X clist[R] = C; X R = S; X C = candidate(A, B, clist[S - 1]); X } X X 5. Extend the number of subsequences, moving the fence: X X If (S == ktop) { X clist[ktop + 1] = clist[ktop] X ktop = ktop + 1; X break out of step 2's loop; X } X X X X X X X XJan 13 13:00 1988 CDIFF.MEM Page 4 X X X X 6. I = I + 1; X if (member[I] is the first element in another class) { X break out of step 2's loop; X } X else { X continue at step 2. X } X X 7. clist[R] = C; exit merge subroutine. X X To illustrate vector contents, here is a sample run: X X File A: X line 1 X line 2 X line 3 X line 4 X line 5 gets deleted X line 6 X X File B: X line 1 X line 1.5 inserted X line 2 X line 3 changed X line 4 X line 6 X X (For clarity, the "squish" step is omitted from the following) X X On entry to equiv() (after readin and sorting), the file[] vector X is as follows (the first entry in each pair is the line number, X the second is the hash value. Entries are sorted on hash value): X X FileA[] (1..lines in fileA): X line hash X 3 042400 6 043300 5 050026 1 102201 2 102701 4 103501 X FileB[] (1..lines in fileB): X 6 043300 2 045600 1 102201 3 102701 5 103501 4 147166 X X X After Equiv has processed file[]: X X FileA[] (1..lines in fileA): X line value X 3 0 6 1 5 0 1 3 2 4 4 5 X Member[] (0..lines in fileB) X 0 -6 -2 -1 -3 -5 -4 X X After unsort() has unwound fileB: X X Class[] (1 .. lines in fileA): X 3 4 0 5 0 1 X X Within unravel(), match is built in the following order: X X X X X X X XJan 13 13:00 1988 CDIFF.MEM Page 5 X X X X match[6] := 6 X match[4] := 5 X match[2] := 3 X match[1] := 1 X X Match[] (0 .. lines in fileA): X X 0 1 3 0 5 0 6 X X Output is as follows: X X 1a2 X > line 1.5 inserted X 3c4 X < line 3 X --- X > line 3 changed X 5d5 X < line 5 gets deleted X X******************************************************************** X X/* X * s t r e q . c X */ X X XString Equality Test XString equality test X XSynopsis: X streq(a, b); X char *a; X char *b; X XDescription: X X Return TRUE if the strings are equal. X XBugs X X*************************************************************** X X/* X * e r r o r . c X */ X X XFatal Error Exit X X Synopsis: X X _error() X X error(format, args) X X X X X X X XJan 13 13:00 1988 CDIFF.MEM Page 6 X X X char *format; X X Documentation: X X Fatal error exits. _error() halts, error() prints something X on stderr and then halts. X X Bugs: X X THIS DOES NOT WORK ON MANY SYSTEMS DUE TO EXTREMLY NON-PORTABLE CODE. X Why oh why can't people learn to use varargs properly? This code will X blow up on OSK. Fortunatly, it isn't used often... X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X END_OF_diff.mem if test 9327 -ne `wc -c