diffutils: Binary
1.7 Binary Files and Forcing Text Comparisons
=============================================
If 'diff' thinks that either of the two files it is comparing is binary
(a non-text file), it normally treats that pair of files much as if the
summary output format had been selected (⇒Brief), and reports
only that the binary files are different. This is because line by line
comparisons are usually not meaningful for binary files. This does not
count as trouble, even though the resulting output does not capture all
the differences.
'diff' determines whether a file is text or binary by checking the
first few bytes in the file; the exact number of bytes is system
dependent, but it is typically several thousand. If every byte in that
part of the file is non-null, 'diff' considers the file to be text;
otherwise it considers the file to be binary.
Sometimes you might want to force 'diff' to consider files to be
text. For example, you might be comparing text files that contain null
characters; 'diff' would erroneously decide that those are non-text
files. Or you might be comparing documents that are in a format used by
a word processing system that uses null characters to indicate special
formatting. You can force 'diff' to consider all files to be text
files, and compare them line by line, by using the '--text' ('-a')
option. If the files you compare using this option do not in fact
contain text, they will probably contain few newline characters, and the
'diff' output will consist of hunks showing differences between long
lines of whatever characters the files contain.
You can also force 'diff' to report only whether files differ (but
not how). Use the '--brief' ('-q') option for this.
In operating systems that distinguish between text and binary files,
'diff' normally reads and writes all data as text. Use the '--binary'
option to force 'diff' to read and write binary data instead. This
option has no effect on a POSIX-compliant system like GNU or traditional
Unix. However, many personal computer operating systems represent the
end of a line with a carriage return followed by a newline. On such
systems, 'diff' normally ignores these carriage returns on input and
generates them at the end of each output line, but with the '--binary'
option 'diff' treats each carriage return as just another input
character, and does not generate a carriage return at the end of each
output line. This can be useful when dealing with non-text files that
are meant to be interchanged with POSIX-compliant systems.
The '--strip-trailing-cr' causes 'diff' to treat input lines that end
in carriage return followed by newline as if they end in plain newline.
This can be useful when comparing text that is imperfectly imported from
many personal computer operating systems. This option affects how lines
are read, which in turn affects how they are compared and output.
If you want to compare two files byte by byte, you can use the 'cmp'
program with the '--verbose' ('-l') option to show the values of each
differing byte in the two files. With GNU 'cmp', you can also use the
'-b' or '--print-bytes' option to show the ASCII representation of those
bytes. ⇒Invoking cmp, for more information.
If 'diff3' thinks that any of the files it is comparing is binary (a
non-text file), it normally reports an error, because such comparisons
are usually not useful. 'diff3' uses the same test as 'diff' to decide
whether a file is binary. As with 'diff', if the input files contain a
few non-text bytes but otherwise are like text files, you can force
'diff3' to consider all files to be text files and compare them line by
line by using the '-a' or '--text' option.