diffCsv: Diff CSV Files

Description Usage Arguments Value See Also Examples

Description

Reads CSV files with read.csv and passes the resulting data frames onto diffPrint. extra values are passed as arguments are passed to both read.csv and print. To the extent you wish to use different extra arguments for each of those functions you will need to read.csv the files and pass them to diffPrint yourself.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
diffCsv(target, current, ...)

## S4 method for signature 'ANY'
diffCsv(
  target,
  current,
  mode = gdo("mode"),
  context = gdo("context"),
  format = gdo("format"),
  brightness = gdo("brightness"),
  color.mode = gdo("color.mode"),
  word.diff = gdo("word.diff"),
  pager = gdo("pager"),
  guides = gdo("guides"),
  trim = gdo("trim"),
  rds = gdo("rds"),
  unwrap.atomic = gdo("unwrap.atomic"),
  max.diffs = gdo("max.diffs"),
  disp.width = gdo("disp.width"),
  ignore.white.space = gdo("ignore.white.space"),
  convert.hz.white.space = gdo("convert.hz.white.space"),
  tab.stops = gdo("tab.stops"),
  line.limit = gdo("line.limit"),
  hunk.limit = gdo("hunk.limit"),
  align = gdo("align"),
  style = gdo("style"),
  palette.of.styles = gdo("palette"),
  frame = par_frame(),
  interactive = gdo("interactive"),
  term.colors = gdo("term.colors"),
  tar.banner = NULL,
  cur.banner = NULL,
  strip.sgr = gdo("strip.sgr"),
  sgr.supported = gdo("sgr.supported"),
  extra = list()
)

Arguments

target

character(1L) or file connection with read capability; if character should point to a CSV file

current

like target

...

unused, for compatibility of methods with generics

mode

character(1L), one of:

  • “unified”: diff mode used by git diff

  • “sidebyside”: line up the differences side by side

  • “context”: show the target and current hunks in their entirety; this mode takes up a lot of screen space but makes it easier to see what the objects actually look like

  • “auto”: default mode; pick one of the above, will favor “sidebyside” unless getOption("width") is less than 80, or in diffPrint and objects are dimensioned and do not fit side by side, or in diffChr, diffDeparse, diffFile and output does not fit in side by side without wrapping

context

integer(1L) how many lines of context are shown on either side of differences (defaults to 2). Set to -1L to allow as many as there are. Set to “auto” to display as many as 10 lines or as few as 1 depending on whether total screen lines fit within the number of lines specified in line.limit. Alternatively pass the return value of auto_context to fine tune the parameters of the auto context calculation.

format

character(1L), controls the diff output format, one of:

  • “auto”: to select output format based on terminal capabilities; will attempt to use one of the ANSI formats if they appear to be supported, and if not or if you are in the Rstudio console it will attempt to use HTML and browser output if in interactive mode.

  • “raw”: plain text

  • “ansi8”: color and format diffs using basic ANSI escape sequences

  • “ansi256”: like “ansi8”, except using the full range of ANSI formatting options

  • “html”: color and format using HTML markup; the resulting string is processed with enc2utf8 when output as a full web page (see docs for html.output under Style).

Defaults to “auto”. See palette.of.styles for details on customization, style for full control of output format. See 'pager' parameter for more discussion of Rstudio behavior.

brightness

character, one of “light”, “dark”, “neutral”, useful for adjusting color scheme to light or dark terminals. “neutral” by default. See PaletteOfStyles for details and limitations. Advanced: you may specify brightness as a function of format. For example, if you typically wish to use a “dark” color scheme, except for when in “html” format when you prefer the “light” scheme, you may use c("dark", html="light") as the value for this parameter. This is particularly useful if format is set to “auto” or if you want to specify a default value for this parameter via options. Any names you use should correspond to a format. You must have one unnamed value which will be used as the default for all formats that are not explicitly specified.

color.mode

character, one of “rgb” or “yb”. Defaults to “yb”. “yb” stands for “Yellow-Blue” for color schemes that rely primarily on those colors to style diffs. Those colors can be easily distinguished by individuals with limited red-green color sensitivity. See PaletteOfStyles for details and limitations. Also offers the same advanced usage as the brightness parameter.

word.diff

TRUE (default) or FALSE, whether to run a secondary word diff on the in-hunk differences. For atomic vectors setting this to FALSE could make the diff slower (see the unwrap.atomic parameter). For other uses, particularly with diffChr setting this to FALSE can substantially improve performance.

pager

one of “auto” (default), “on”, “off”, a Pager object, or a list; controls whether and how a pager is used to display the diff output. If you require a particular pager behavior you must use a Pager object, or “off” to turn off the pager. All other settings will interact with other parameters such as format, style, as well as with your system capabilities in order to select the pager expected to be most useful.

“auto” and “on” are the same, except that in non-interactive mode “auto” is equivalent to “off”. “off” will always send output to the console. If “on”, whether the output actually gets routed to the pager depends on the pager threshold setting (see Pager). The default behavior is to use the pager associated with the Style object. The Style object is itself is determined by the format or style parameters.

Depending on your system configuration different styles and corresponding pagers will get selected, unless you specify a Pager object directly. On a system with a system pager that supports ANSI CSI SGR colors, the pager will only trigger if the output is taller than one window. If the system pager is not known to support ANSI colors then the output will be sent as HTML to the IDE viewer if available or to the web browser if not. Even though Rstudio now supports ANSI CSI SGR at the console output is still formatted as HTML and sent to the IDE viewer. Partly this is for continuity of behavior, but also because the default Rstudio pager does not support ANSI CSI SGR, at least as of this writing.

If pager is a list, then the same as with “on”, except that the Pager object associated with the selected Style object is re-instantiated with the union of the list elements and the existing settings of that Pager. The list should contain named elements that correspond to the Pager instantiation parameters. The names must be specified in full as partial parameter matching will not be carried out because the pager is re-instantiated with new.

See Pager, Style, and PaletteOfStyles for more details and for instructions on how to modify the default behavior.

guides

TRUE (default), FALSE, or a function that accepts at least two arguments and requires no more than two arguments. Guides are additional context lines that are not strictly part of a hunk, but provide important contextual data (e.g. column headers). If TRUE, the context lines are shown in addition to the normal diff output, typically in a different color to indicate they are not part of the hunk. If a function, the function should accept as the first argument the object being diffed, and the second the character representation of the object. The function should return the indices of the elements of the character representation that should be treated as guides. See guides for more details.

trim

TRUE (default), FALSE, or a function that accepts at least two arguments and requires no more than two arguments. Function should compute for each line in captured output what portion of those lines should be diffed. By default, this is used to remove row meta data differences (e.g. [1,]) so they alone do not show up as differences in the diff. See trim for more details.

rds

TRUE (default) or FALSE, if TRUE will check whether target and/or current point to a file that can be read with readRDS and if so, loads the R object contained in the file and carries out the diff on the object instead of the original argument. Currently there is no mechanism for specifying additional arguments to readRDS

unwrap.atomic

TRUE (default) or FALSE. Relevant primarily for diffPrint, if TRUE, and word.diff is also TRUE, and both target and current are unnamed one-dimension atomics , the vectors are unwrapped and diffed element by element, and then re-wrapped. Since diffPrint is fundamentally a line diff, the re-wrapped lines are lined up in a manner that is as consistent as possible with the unwrapped diff. Lines that contain the location of the word differences will be paired up. Since the vectors may well be wrapped with different periodicities this will result in lines that are paired up that look like they should not be paired up, though the locations of the differences should be. If is entirely possible that setting this parameter to FALSE will result in a slower diff. This happens if two vectors are actually fairly similar, but their line representations are not. For example, in comparing 1:100 to c(100, 1:99), there is really only one difference at the “word” level, but every screen line is different. diffChr will also do the unwrapping if it is given a character vector that contains output that looks like the atomic vectors described above. This is a bug, but as the functionality could be useful when diffing e.g. capture.output data, we now declare it a feature.

max.diffs

integer(1L), number of differences (default 50000L) after which we abandon the O(n^2) diff algorithm in favor of a naive O(n) one. Set to -1L to stick to the original algorithm up to the maximum allowed (~INT_MAX/4).

disp.width

integer(1L) number of display columns to take up; note that in “sidebyside” mode the effective display width is half this number (set to 0L to use default widths which are getOption("width") for normal styles and 80L for HTML styles. Future versions of diffobj may change this to larger values for two dimensional objects for better diffs (see details).

ignore.white.space

TRUE or FALSE, whether to consider differences in horizontal whitespace (i.e. spaces and tabs) as differences (defaults to TRUE).

convert.hz.white.space

TRUE or FALSE, whether modify input strings that contain tabs and carriage returns in such a way that they display as they would with those characters, but without using those characters (defaults to TRUE). The conversion assumes that tab stops are spaced evenly eight characters apart on the terminal. If this is not the case you may specify the tab stops explicitly with tab.stops.

tab.stops

integer, what tab stops to use when converting hard tabs to spaces. If not integer will be coerced to integer (defaults to 8L). You may specify more than one tab stop. If display width exceeds that addressable by your tab stops the last tab stop will be repeated.

line.limit

integer(2L) or integer(1L), if length 1 how many lines of output to show, where -1 means no limit. If length 2, the first value indicates the threshold of screen lines to begin truncating output, and the second the number of lines to truncate to, which should be fewer than the threshold. Note that this parameter is implemented on a best-efforts basis and should not be relied on to produce the exact number of lines requested. In particular do not expect it to work well for for values small enough that the banner portion of the diff would have to be trimmed. If you want a specific number of lines use [ or head / tail. One advantage of line.limit over these other options is that you can combine it with context="auto" and auto max.level selection (the latter for diffStr), which allows the diff to dynamically adjust to make best use of the available display lines. [, head, and tail just subset the text of the output.

hunk.limit

integer(2L) or integer (1L), how many diff hunks to show. Behaves similarly to line.limit. How many hunks are in a particular diff is a function of how many differences, and also how much context is used since context can cause two hunks to bleed into each other and become one.

align

numeric(1L) between 0 and 1, proportion of words in a line of target that must be matched in a line of current in the same hunk for those lines to be paired up when displayed (defaults to 0.25), or an AlignThreshold object. Set to 1 to turn off alignment which will cause all lines in a hunk from target to show up first, followed by all lines from current. Note that in order to be aligned lines must meet the threshold and have at least 3 matching alphanumeric characters (see AlignThreshold for details).

style

“auto”, a Style object, or a list. “auto” by default. If a Style object, will override the the format, brightness, and color.mode parameters. The Style object provides full control of diff output styling. If a list, then the same as “auto”, except that if the auto-selected Style requires instantiation (see PaletteOfStyles), then the list contents will be used as arguments when instantiating the style object. See Style for more details, in particular the examples.

palette.of.styles

PaletteOfStyles object; advanced usage, contains all the Style objects or “classRepresentation” objects extending Style that are selected by specifying the format, brightness, and color.mode parameters. See PaletteOfStyles for more details.

frame

an environment to use as the evaluation frame for the print/show/str, calls and for diffObj, the evaluation frame for the diffPrint / diffStr calls. Defaults to the return value of par_frame.

interactive

TRUE or FALSE whether the function is being run in interactive mode, defaults to the return value of interactive. If in interactive mode, pager will be used if pager is “auto”, and if ANSI styles are not supported and style is “auto”, output will be send to viewer/browser as HTML.

term.colors

integer(1L) how many ANSI colors are supported by the terminal. This variable is provided for when crayon::num_colors does not properly detect how many ANSI colors are supported by your terminal. Defaults to return value of crayon::num_colors and should be 8 or 256 to allow ANSI colors, or any other number to disallow them. This only impacts output format selection when style and format are both set to “auto”.

tar.banner

character(1L), language, or NULL, used to generate the text to display ahead of the diff section representing the target output. If NULL will use the deparsed target expression, if language, will use the language as it would the target expression, if character(1L), will use the string with no modifications. The language mode is provided because diffStr modifies the expression prior to display (e.g. by wrapping it in a call to str). Note that it is possible in some cases that the substituted value of target actually is character(1L), but if you provide a character(1L) value here it will be assumed you intend to use that value literally.

cur.banner

character(1L) like tar.banner, but for current

strip.sgr

TRUE, FALSE, or NULL (default), whether to strip ANSI CSI SGR sequences prior to comparison and for display of diff. If NULL, resolves to TRUE if 'style' resolves to an ANSI formatted diff, and FALSE otherwise. The default behavior is to avoid confusing diffs where the original SGR and the SGR added by the diff are mixed together.

sgr.supported

TRUE, FALSE, or NULL (default), whether to assume the standard output device supports ANSI CSI SGR sequences. If TRUE, strings will be manipulated accounting for the SGR sequences. If NULL, resolves to TRUE if 'style' resolves to an ANSI formatted diff, and to 'crayon::has_color()' otherwise. This only controls how the strings are manipulated, not whether SGR is added to format the diff, which is controlled by the 'style' parameter. This parameter is exposed for the rare cases where you might wish to control string manipulation behavior directly.

extra

list additional arguments to pass on to the functions used to create text representation of the objects to diff (e.g. print, str, etc.)

Value

a Diff object; see diffPrint.

See Also

diffPrint for details on the diff* functions, diffObj, diffStr, diffChr to compare character vectors directly, ses for a minimal and fast diff

Examples

1
2
3
4
5
6
7
8
9
iris.2 <- iris
iris.2$Sepal.Length[5] <- 99
f1 <- tempfile()
f2 <- tempfile()
write.csv(iris, f1, row.names=FALSE)
write.csv(iris.2, f2, row.names=FALSE)
## `pager="off"` for CRAN compliance; you may omit in normal use
diffCsv(f1, f2, pager="off")
unlink(c(f1, f2))

Example output

< f1                                                              
> f2                                                              
@@ 4,5 / 4,5 @@                                                   
~     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
  3            4.7         3.2          1.3         0.2     setosa
  4            4.6         3.1          1.5         0.2     setosa
< 5            5.0         3.6          1.4         0.2     setosa
> 5           99.0         3.6          1.4         0.2     setosa
  6            5.4         3.9          1.7         0.4     setosa
  7            4.6         3.4          1.4         0.3     setosa

diffobj documentation built on Oct. 5, 2021, 9:07 a.m.