Methods to Remove Unsemantic Text Prior to Diff
diff* methods, in particular
diffPrint, modify the text
representation of an object prior to running the diff to reduce the incidence
of spurious mismatches caused by unsemantic differences. For example, we
look to remove matrix row indices and atomic vector indices (i.e. the
 or [1,] strings at the beginning of each display line).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
trimPrint(obj, obj.as.chr) ## S4 method for signature 'ANY,character' trimPrint(obj, obj.as.chr) trimStr(obj, obj.as.chr) ## S4 method for signature 'ANY,character' trimStr(obj, obj.as.chr) trimChr(obj, obj.as.chr) ## S4 method for signature 'ANY,character' trimChr(obj, obj.as.chr) trimDeparse(obj, obj.as.chr) ## S4 method for signature 'ANY,character' trimDeparse(obj, obj.as.chr) trimFile(obj, obj.as.chr) ## S4 method for signature 'ANY,character' trimFile(obj, obj.as.chr)
1 2 3 4 5 6 7 8 9
In this case, the line by line diff would find all rows of the matrix to be mismatched because where the data matches (rows containing 11 and 12) the indices do not. By trimming out the row indices before the diff, the diff can recognize that row 2 and 3 from the first matrix should be matched to row 1 and 2 of the second.
These methods follow a similar interface as the
methods, with one available for each
diff* method except for
diffCsv since that one uses
diffPrint internally. The
unsemantic differences are added back after the diff for display purposes,
and are colored in grey to indicate they are ignored in the diff.
trimStr do anything meaningful.
trimPrint removes row index headers provided that they are of the
default un-named variety. If you add row names, or if numeric row indices
are not ascending from 1, they will not be stripped as those have meaning.
trimStr removes the ..$ and ..- tokens
to minimize spurious matches.
You can modify how text is trimmed by providing your own functions to the
trim argument of the
diff* methods, or by defining
trim* methods for your objects. Note that the return value for these
functions is the start and end columns of the text that should be
kept and used in the diff.
length(obj.as.chr) row and 2 column integer matrix with the
start (first column) and end (second column) character positions of the sub
string to run diffs on.
obj.as.chr will be post