compare_data: Compare structures of two datasets

View source: R/compare_data.R

compare_dataR Documentation

Compare structures of two datasets

Description

This function extracts the structures of two data.frames and compares them, issuing a series of diagnostics.

Usage

compare_data(ref, x, ...)

## Default S3 method:
compare_data(ref, x, ...)

## S3 method for class 'data_structure'
compare_data(
  ref,
  x,
  use_dim = TRUE,
  use_names = TRUE,
  use_classes = TRUE,
  use_values = TRUE,
  columns = TRUE,
  ...
)

## S3 method for class 'data.frame'
compare_data(ref, x, ...)

## S3 method for class 'data_comparison'
print(x, ..., common_values = TRUE, diff_only = TRUE)

Arguments

ref

the reference data.frame

x

a data.frame to be checked

...

further arguments passed to other methods

use_dim

a logical indicating if dataset dimensions should be compared

use_names

a logical indicating if names of the variables should be compared

use_classes

a logical indicating if classes of the variables should be compared

use_values

a logical indicating if values of matching categorical variables should be compared

columns

the names or indices of columns to compare. Defaults to TRUE which will keep all columns by default.

common_values

when TRUE (default), common values are printed. When FALSE, common values are suppressed.

diff_only

when TRUE (default) only differences between ref and the current data content are presented, ignoring similarities. common values are hidden.

Details

The comparison relies on checking differences in:

  • names of columns

  • classes of the columns (only the first class is used)

  • values of the categorical variables

Value

an object of class data_comparison. This is a named list for each test

Author(s)

Thibaut Jombart

Examples


## no differences
compare_data(iris, iris)

## different dimensions
compare_data(iris, iris[-1, -2])
compare_data(iris[-1, -2], iris) # inverse

## one variable in common but different class and content
compare_data(iris,
             data.frame(Species = letters,
                        stringsAsFactors = FALSE))

## Comparing only specific columns

iris1 <- iris2 <- iris
iris1$letter <- sample(letters[1:3], nrow(iris), replace = TRUE)
iris2$letter <- sample(letters[1:8], nrow(iris), replace = TRUE)
compare_data(iris1, iris2, columns = "Species")
compare_data(iris, iris2, columns = "Species")
compare_data(iris, iris1)
compare_data(iris1, iris2)


reconhub/linelist documentation built on Jan. 1, 2023, 9:39 p.m.