compare_diff: Compare two datasets for differences

View source: R/compare_diff.R

compare_diffR Documentation

Compare two datasets for differences

Description

Compare two datasets for differences

Usage

compare_new(.data1, .data2, by = "ID")

compare_diff(
  .data1,
  .data2,
  by = "ID",
  exclude = c("Title", "Coder", "Comments"),
  diff_threshold = 0
)

Arguments

.data1

First dataset to compare

.data2

Second dataset to compare

by

Column name to join on (default is "ID")

exclude

Character vector of column names to exclude from comparison. By default, "Title", "Coder", and "Comments" are excluded.

diff_threshold

Integer specifying the minimum number of differing columns for a row to be included in the output. Default is 0, meaning any difference will be included. Set to 3 to only show rows with at least 3 differing columns.

Details

This function uses dplyr::anti_join to find rows in .data1 that are not present in .data2. If no differences are found, a message is printed and NULL is returned. If differences are found, they are returned as a data frame.

Value

A data frame with the differences found

Examples

## Not run: 
df1 <- data.frame(ID = 1:5, Value = letters[1:5])
df2 <- data.frame(ID = 3:7, Value = letters[3:7])
compare_new(df1, df2)
compare_new(df1, df1)

## End(Not run)
compare_diff(emperors$Wikipedia, emperors$Britannica)

manydata documentation built on Nov. 5, 2025, 7:23 p.m.