| ↑↑↑ Home | ↑↑ UNIX | ↑ Updateware |
diffn is a small program intended for comparing a set of files for pairwise equality. There are ways to so this using standard utilities, but they are inefficient and therefore unsuitable for large files. For example, you could run diff for every pair of files:
for i in <files> ; do
for j in <files> ; do
diff -q $i $j
done
done
But that would perform every comparison twice, and read every file n times. A different possibility is computing a hash:
md5sum <files> | sort
Then one could visually compare the MD5 sums (with equal ones next to each other because of the sort). But this reads all the data from each file, which makes it slow for large files.
diffn reads each file only once, and only up to the point when it is seen to be different from all others. Before starting to read, it compares file sizes to sort out differently sized files. The way it outputs equal files can be adapted to facilitate reading its output with a different program.
Get diffn: Download gzipped tar archive
diffn - compare n files for equality
diffn determines which of a set of files have the same content or are otherwise equivalent. It is intended for comparing large, possibly binary files, some of which are duplicates. Because it reads each file only once and stops at the first difference, it is more efficient than n-by-n diff or comparing md5sums. It can also handle special files such as directories, symbolic links, devices, sockets and others.
diffn distinguishes two kinds of equality. Files are considered identical when they refer to the same inode. Symbolic links are considered identical to their targets unless the -l option is given. Otherwise, only hard links are identical. Regular files are considered equal if their contents are the same. Device files are equal if their major and minor numbers are equal. All other special files are not equal to each other.
diffn tries to be as efficient as possible when comparing regular files. File sizes are compared first. Files are read and compared block by block. Once a file is found to be different from all others, it is output and removed from the set. Only duplicate files have to be read completely, and this is unavoidable because there could always be a difference later on.
Each set of equal files is output one file per line. All but the first are indented and prepended with "==". Files identical to their predecessors are indented more and prepended by "===". This can be changed with the -*sep options. Because files are output as they are found to be unique, the output is not sorted in any way.
Print brief usage information.
Do not dereference symbolic links. Symlinks will not be considered equal to their targets, or to other symlinks with the same target.
Do not output anything, but return 1 if all files are unique.
Do not output anything, but return 1 if not all files are equal.
Set the separator strings between sets of equal files, between equal and
identical files, respectively. --eqsep also sets the separator between
identical files unless --idsep overrides it. This allows to taylor
diffn's output for further automated processing. For example, passing
--eqsep " " will print one set of equal files per line.
If the -q or -Q option is given, a return value of 1 indicates differences. A return value greater than 1 indicates an error. Otherwise, 0 is returned.
diffn is (c) 2010 Volker Schatz. It is free software and may be redistributed and/or modified under the terms of the GNU General Public License, version 3 or later.