compare_numeric_distributions: Compare the distribution of common fields across two data...

Description Usage Arguments Value See Also

Description

Typically you have a data set whose integrity is unknown, and you want to compare it to a data set whose reliability has already been established by other means. With this function, you can compare the uncertain data set (the "challenger") to the certain one (the "baseline") and see if they have similar enough distributions.

Usage

1
2
3
4
5
6
compare_numeric_distributions(challenger, baseline,
  summaries = default_numeric_summaries,
  tests = default_numeric_tests(tolerance =
  getOption("vardist.numeric_summary_tolerance", 0.1), ks_test_threshold =
  getOption("vardist.ks_test_threshold", 0.5)), parallel = FALSE,
  mc.cores = parallel::detectCores())

Arguments

challenger

data.frame.

baseline

data.frame.

summaries

list. A named list of summary functions. Each function must take as an input one numeric vector, and output a numeric vector of length 1.

tests

list. A named list of functions that return TRUE or FALSE, and take in columns or summary statistics fom the challenger and the baseline.

parallel

logical. Should we use mclapply instead of lapply?

mc.cores

numeric. To be passed into mclapply.

Value

a list with three data frames - the columnwise summaries for the challenger, the columnwise summaries for the baseline, and a report with the results of the tests.

See Also

calculate_summaries, generate_report


avantoss/vardist documentation built on May 24, 2019, 3:03 a.m.