keysPoolCheck: Compares keys from different data sets; finds differences...

View source: R/variableKey.R

keysPoolCheckR Documentation

Compares keys from different data sets; finds differences classes of variables. This used to check for similarity of keys from various data sets, one precursor to either combining the keys or merging the data sets themselves.

Description

When several supposedly "equivalent" data sets are used to generate variable keys, there may be trouble. If variables with same name have different classes, keyApply might fail when applied to one of the data sets.

Usage

keysPoolCheck(keys, col = "class_old", excludere = "TEXT$")

Arguments

keys

A list with variable keys.

col

Name of key column to check for equivalence. Default is "class_old", but "class_new" can be checked as well.

excludere

Exclude variables matching a regular expression (re). Default example shows exclusion of variables that end in the symbol "TEXT".

Details

This reports on differences in classes among keys. By default, it looks for differences in "class_old", because that's where we usually see trouble.

The output here is diagnostic. The keys can be fixed manually, or the function keysPool can implement an automatic correction.

Value

Data.frame summarizing class differences among keys

Author(s)

Paul Johnson

Examples

set.seed(234)
dat1 <- data.frame(x1 = rnorm(100),
                   x2 = sample(c("Male", "Female"), 100, replace = TRUE),
                   x3_TEXT = "A", x4 = sample(1:10000, 100))
dat2 <- data.frame(x1 = rnorm(100), x2 = sample(c("Male", "Female"),
                   100, replace = TRUE),
                   x3_TEXT = sample(1:100, 100),
                   stringsAsFactors = FALSE)
key1 <- keyTemplate(dat1)
key2 <- keyTemplate(dat2)
keys <- list(key1, key2)
keysPoolCheck(keys)
## See problem in class_old
keysPoolCheck(keys, col = "class_old")
## problems in class_new
keysPoolCheck(keys, col = "class_new")
keysPoolCheck(keys, excludere = "TEXT$")

kutils documentation built on Sept. 17, 2023, 5:06 p.m.