mergeCheck: First draft of function to diagnose problems in merges and...

Description Usage Arguments Value Author(s) Examples

View source: R/mergeCheck.R

Description

This is a first effort. It works with 2 data frames and 1 key variable in each. It does not work if the by parameter includes more than one column name (but may work in future). The return is a list which includes full copies of the rows from the data frames in which trouble is observed.

Usage

1
2
mergeCheck(x, y, by, by.x = by, by.y = by, incomparables = c(NULL,
  NA, NaN, Inf, "\\s+", ""))

Arguments

x

data frame

y

data frame

by

Commonly called the "key" variable. A column name to be used for merging (common to both x and y)

by.x

Column name in x to be used for merging. If not supplied, then by.x is assumed to be same as by.

by.y

Column name in y to be used for merging. If not supplied, then by.y is assumed to be same as by.

incomparables

values in the key (by) variable that are ignored for matching. We default to include these values as incomparables: c(NULL, NA, NaN, Inf, "\s+", ""). Note this is a larger list of incomparables than assumed by R merge (which assumes only NULL).

Value

A list of data structures that are displayed for keys and data sets. The return is list(keysBad, keysDuped, unmatched). unmatched is a list with 2 elements, the unmatched cases from x and y.

Author(s)

Paul Johnson

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
df1 <- data.frame(id = 1:7, x = rnorm(7))
df2 <- data.frame(id = c(2:6, 9:10), x = rnorm(7))
mc1 <- mergeCheck(df1, df2, by = "id")
## Use mc1 objects mc1$keysBad, mc1$keysDuped, mc1$unmatched
df1 <- data.frame(id = c(1:3, NA, NaN, "", " "), x = rnorm(7))
df2 <- data.frame(id = c(2:6, 5:6), x = rnorm(7))
mergeCheck(df1, df2, by = "id")
df1 <- data.frame(idx = c(1:5, NA, NaN), x = rnorm(7))
df2 <- data.frame(idy = c(2:6, 9:10), x = rnorm(7))
mergeCheck(df1, df2, by.x = "idx", by.y = "idy")

kutils documentation built on April 30, 2020, 1:05 a.m.