View source: R/compare_datasets.R
| compare_datasets | R Documentation |
Compares two datasets at three levels in a single call:
Dataset level – dimensions, column overlap, missing-value totals.
Variable level – column name discrepancies and data-type
mismatches (delegates to compare_variables()).
Observation level – row-by-row value differences on common
columns. Uses positional matching by default, or key-based matching when
id_vars is provided.
The return value is a list with class "dataset_comparison", which has
a tidy print() method. The same object is accepted by
generate_summary_report(), generate_detailed_report(), and
compare_by_group().
compare_datasets(df1, df2, tolerance = 0, vars = NULL, id_vars = NULL)
df1 |
A data frame (the base dataset). |
df2 |
A data frame (the compare dataset). |
tolerance |
Numeric tolerance value for floating-point comparisons (default 0). When tolerance > 0, numeric values are considered equal if their absolute difference is within the tolerance threshold. Character and factor columns always use exact matching regardless of tolerance. |
vars |
Optional character vector of variable names to compare. When provided, only these columns are included in the observation-level comparison. Structural comparison (extra columns, type mismatches) still covers all columns. Default is NULL (compare all common columns). |
id_vars |
Optional character vector of column names to use as matching
keys. When provided, rows are matched by these key columns instead of by
position. This allows comparison of datasets with different row counts or
different row orders. Rows that exist in only one dataset are reported in
|
A dataset_comparison list containing:
nrow_df1, ncol_df1 |
Dimensions of df1. |
nrow_df2, ncol_df2 |
Dimensions of df2. |
common_columns |
Character vector of columns present in both. |
extra_in_df1 |
Columns only in df1. |
extra_in_df2 |
Columns only in df2. |
type_mismatches |
Data frame of columns whose class differs
(columns: |
missing_values |
Data frame summarising NA counts per column per
dataset (columns: |
variable_comparison |
Output of |
observation_comparison |
Output of |
id_vars |
Character vector of key columns used for matching, or
|
unmatched_rows |
List with |
# Positional matching (default)
df1 <- data.frame(id = 1:3, val = c(10, 20, 30))
df2 <- data.frame(id = 1:3, val = c(10, 25, 30))
result <- compare_datasets(df1, df2)
result
# Key-based matching (for different row counts or row orders)
df1 <- data.frame(id = c(1, 2, 3), val = c(10, 20, 30))
df2 <- data.frame(id = c(2, 3, 4), val = c(20, 35, 40))
result <- compare_datasets(df1, df2, id_vars = "id")
result
result$unmatched_rows
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.