home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
GEMini Atari
/
GEMini_Atari_CD-ROM_Walnut_Creek_December_1993.iso
/
files
/
utility
/
diff
/
diff.mem
< prev
next >
Wrap
Text File
|
1988-07-20
|
10KB
|
347 lines
Jan 13 13:00 1988 CDIFF.MEM Page 1
Diff maintains all information needed to compare the two files in
main memory. This means that very large files (or fairly large
files with many differences) will cause the program to abort with
an "out of space" error. Main memory requirements (in words) are
approximately:
2 * (length of file1 + length of file2) + (3 * number of changes)
The diff algorithm reads each file twice (once to build hash
tables and a second time to check for fortuitous matches), then
reads the differences by seeking randomly within the files. CPU
time requirements include sorting the two hash vectors and
randomly searching memory tables for equivalence classes. For
example, running in Vax compatibility mode, two 1000 line files
with a fair number of differences took about 25 seconds (elapsed
wall clock time) for processing. Most of this time was spent in
the file read routines. This test required slightly more than
6000 words of memory for internal tables.
The diff algorithm was developed by J. W. Hunt and M. D. McIlroy,
using a central algorithm defined by H. S. Stone. The algorithm
was described in:
Hunt, J. W., and McIlroy, M. D.,
An Algorithm for Differential File Comparison,
Computing Science Technical Report #41,
Bell Laboratories, Murray Hill, NJ 07974
The following description is summarized from that document. While
it has been slightly modified to correspond to the program
source, the algorithm is essentially identical.
1. Read the input files, building two vectors containing the line
number (serial) and hash value (hash) of each line. Data for
fileA will be in a vector pointed to by fi\eA[], while data for
fileB will be pointed to by fileB[]. The lengths (number of
lines) of the files will be represented by lenA and lenB
respectiveLy. [This is slightly different from the + 3 * (number of changes)
(Where "length" is the number of lines of data in each
file.)
The algorithm reads each file twice, once to build hash
tables and once to check for fortuitous matches (two lines
that are in fact different, but which have the same hash
value). CPU time requirements include sorting the hash
tables and randomly searching memory tables for equivalence
classes. For example, on a time-shared VAX-11/780, comparing
two 1000 line files required about 30 seconds (elapsed clock
time) and about 10,000 bytes of working storage. About 90
per-cent of the time was taken up by file I/O.
D✓D✓D✓DI✓I✓I✓IA✓A✓A✓AG✓G✓G✓GN✓N✓N✓NO✓O✓O✓OS✓S✓S✓ST✓T✓T✓TI✓I✓I✓IC✓C✓C✓CS✓S✓S✓S
Warning, bad option 'x'
The option is ignored.
Page 1 (printed 1/13/88)
C✓C✓C✓CD✓D✓D✓DI✓I✓I✓IF✓F✓F✓FF✓F✓F✓F(✓(✓(✓(1✓1✓1✓1)✓)✓)✓) U✓U✓U✓UN✓N✓N✓NI✓I✓I✓IX✓X✓X✓X 5✓5✓5✓5.✓.✓.✓.0✓0✓0✓0 C✓C✓C✓CD✓D✓D✓DI✓I✓I✓IF✓F✓F✓FF✓F✓F✓F(✓(✓(✓(1✓1✓1✓1)✓)✓)✓)
Usage ...
Two input files were not specified.
Can't open input file "filename".
Can't continue.
Out of space
The program ran out of memory while comparing the two
files.
Can't read line nnn at xxx in file[A/B]
This indicates an I/O error when seeking to the
specific line. It should not happen.
Spurious match, output is not optimal.
Two lines that were different yielded the same hash
value. This is harmless except that the difference
output is not the minimum set of differences between
the two files. For example, instead of the output:
lines 1 to 5 were changed to ...
the program will print
lines 1 to 3 were changed to ...
lines 4 to 5 were changed to ...
The program uses a CRC16 hash code.
The likelihood of this error is quite small.
A✓A✓A✓AU✓U✓U✓UT✓T✓T✓TH✓H✓H✓HO✓O✓O✓OR✓R✓R✓R
The diff algorithm was developed by J. W. Hunt and M. D.
McIlroy, using a central algorithm defined by H. S. Stone.
It was published in:
Hunt, J. W., and McIlroy, M. D.,
An Algorithm for Differential File Comparison,
Computing Science Technical Report #41,
Bell Laboratories, Murray Hill, NJ 07974
B✓B✓B✓BU✓U✓U✓UG✓G✓G✓GS✓S✓S✓S
On RSX and DECUS C on VMS systems, diff may fail if the both
files are not "variable-length, implied carriage control"
format. The scopy program can be used to convert files to
this format if problems arise.
When compiled under VAX C, diff handles STREAM_LF files
properly (in addition to the canonical variable-length
implied carriage control files). Other variations should
work, but have not been tested.
When compiled under VAX C, diff is quite slow for unknown
reasons which ought to be investigated. On the other hand,
it has access to effectively unlimited memory.
Output in a form suitable for ed - the -e option - seems
Page 2 (printed 1/13/88)
C✓C✓C✓CD✓D✓D✓DI✓I✓I✓IF✓F✓F✓FF✓F✓F✓F(✓(✓(✓(1✓1✓1✓1)✓)✓)✓) U✓U✓U✓UN✓N✓N✓NI✓I✓I✓IX✓X✓X✓X 5✓5✓5✓5.✓.✓.✓.0✓0✓0✓0 C✓C✓C✓CD✓D✓D✓DI✓I✓I✓IF✓F✓F✓FF✓F✓F✓F(✓(✓(✓(1✓1✓1✓1)✓)✓)✓)
rather pointless; the analogue on DEC systems is SLP (SUMSLP
on VMS). It would be simple to provide SLP-compatible
output. The question is, why bother - since the various DEC
file comparisonFound 424 control chars in "diff.doc"
utilities already produce it.
Page 3 (printed 1/13/88)
-1 -3 -5 -4
After unsort() has unwound fileB:
Class[] (1 .. lines in fileA):
3 4 0 5 0 1
Within unravel(), match is built in the following order:
Jan 13 13:00 1988 CDIFF.MEM Page 5
match[6] := 6
match[4] := 5
match[2] := 3
match[1] := 1
Match[] (0 .. lines in fileA):
0 1 3 0 5 0 6
Output is as follows:
1a2
> line 1.5 inserted
3c4
< line 3
---
> line 3 changed
5d5
< line 5 gets deleted
********************************************************************
/*
* s t r e q . c
*/
String Equality Test
String equality test
Synopsis:
streq(a, b);
char *a;
char *b;
Description:
Return TRUE if the strings are equal.
Bugs
***************************************************************
/*
* e r r o r . c
*/
Fatal Error Exit
Synopsis:
_error()
error(format, args)
Jan 13 13:00 1988 CDIFF.MEM Page 6
char *format;
Documentation:
Fatal error exits. _error() halts, error() prints something
on stderr and then halts.
Bugs:
THIS DOES NOT WORK ON MANY SYSTEMS DUE TO EXTREMLY NON-PORTABLE CODE.
Why oh why can't people learn to use varargs properly? This code will
blow up on OSK. Fortunatly, it isn't used often...