compare_diff: Compare two datasets for differences
In manydata: Many Global Governance Datacubes

compare_diff

R Documentation

Compare two datasets for differences

Description

Compare two datasets for differences

Usage

compare_new(.data1, .data2, by = "ID")

compare_diff(
  .data1,
  .data2,
  by = "ID",
  exclude = c("Title", "Coder", "Comments"),
  diff_threshold = 0
)

Arguments

`.data1`	First dataset to compare
`.data2`	Second dataset to compare
`by`	Column name to join on (default is "ID")
`exclude`	Character vector of column names to exclude from comparison. By default, "Title", "Coder", and "Comments" are excluded.
`diff_threshold`	Integer specifying the minimum number of differing columns for a row to be included in the output. Default is 0, meaning any difference will be included. Set to 3 to only show rows with at least 3 differing columns.

Details

This function uses dplyr::anti_join to find rows in .data1 that are not present in .data2. If no differences are found, a message is printed and NULL is returned. If differences are found, they are returned as a data frame.

Value

A data frame with the differences found

Examples

## Not run: 
df1 <- data.frame(ID = 1:5, Value = letters[1:5])
df2 <- data.frame(ID = 3:7, Value = letters[3:7])
compare_new(df1, df2)
compare_new(df1, df1)

## End(Not run)
compare_diff(emperors$Wikipedia, emperors$Britannica)

manydata documentation built on Nov. 5, 2025, 7:23 p.m.