README.md

matchmaker R package

Lifecycle:
experimental CRAN
status Travis build
status AppVeyor build
status Codecov test
coverage

The goal of {matchmaker} is to provide dictionary-based cleaning for R users in a simple and intuitive manner built on the {forcats} package. Some of the features of this package include:

Installation

You can install {matchmaker} from CRAN:

install.packages("matchmaker")

Example

The matchmaker package has two user-facing functions that perform dictionary-based cleaning:

Each of these functions have four manditory options:

Mostly, users will be working with match_df() to transform values across specific columns. A typical workflow would be to:

  1. construct your dictionary in a spreadsheet program based on your data
  2. read in your data and dictionary to data frames in R
  3. match!
library("matchmaker")

# Read in data set
dat <- read.csv(matchmaker_example("coded-data.csv"),
  stringsAsFactors = FALSE
)
dat$date <- as.Date(dat$date)

# Read in dictionary
dict <- read.csv(matchmaker_example("spelling-dictionary.csv"),
  stringsAsFactors = FALSE
)

Data

This is the top of our data set, generated for example purposes

| id | date | readmission | treated | facility | age_group | lab_result_01 | lab_result_02 | lab_result_03 | has_symptoms | followup | | :----- | :--------- | :---------- | ------: | :------- | ---------: | :-------------- | :-------------- | :-------------- | :------------ | :------- | | ef267c | 2019-07-08 | NA | 0 | C | 10 | unk | high | inc | NA | u | | e80a37 | 2019-07-07 | y | 0 | 3 | 10 | inc | unk | norm | y | oui | | b72883 | 2019-07-07 | y | 1 | 8 | 30 | inc | norm | inc | | oui | | c9ee86 | 2019-07-09 | n | 1 | 4 | 40 | inc | inc | unk | y | oui | | 40bc7a | 2019-07-12 | n | 1 | 6 | 0 | norm | unk | norm | NA | n | | 46566e | 2019-07-14 | y | NA | B | 50 | unk | unk | inc | NA | NA |

Dictionary

The dictionary looks like this:

| options | values | grp | orders | | :------- | :----------- | :-------------------- | -----: | | y | Yes | readmission | 1 | | n | No | readmission | 2 | | u | Unknown | readmission | 3 | | .missing | Missing | readmission | 4 | | 0 | Yes | treated | 1 | | 1 | No | treated | 2 | | .missing | Missing | treated | 3 | | 1 | Facility 1 | facility | 1 | | 2 | Facility 2 | facility | 2 | | 3 | Facility 3 | facility | 3 | | 4 | Facility 4 | facility | 4 | | 5 | Facility 5 | facility | 5 | | 6 | Facility 6 | facility | 6 | | 7 | Facility 7 | facility | 7 | | 8 | Facility 8 | facility | 8 | | 9 | Facility 9 | facility | 9 | | 10 | Facility 10 | facility | 10 | | .default | Unknown | facility | 11 | | 0 | 0-9 | age_group | 1 | | 10 | 10-19 | age_group | 2 | | 20 | 20-29 | age_group | 3 | | 30 | 30-39 | age_group | 4 | | 40 | 40-49 | age_group | 5 | | 50 | 50+ | age_group | 6 | | high | High | .regex ^lab_result_ | 1 | | norm | Normal | .regex ^lab_result_ | 2 | | inc | Inconclusive | .regex ^lab_result_ | 3 | | y | yes | .global | Inf | | n | no | .global | Inf | | u | unknown | .global | Inf | | unk | unknown | .global | Inf | | oui | yes | .global | Inf | | .missing | missing | .global | Inf |

Matching

# Clean spelling based on dictionary -----------------------------
cleaned <- match_df(dat,
  dictionary = dict,
  from = "options",
  to = "values",
  by = "grp"
)
head(cleaned)
#>       id       date readmission treated    facility age_group
#> 1 ef267c 2019-07-08     Missing     Yes     Unknown     10-19
#> 2 e80a37 2019-07-07         Yes     Yes Facility  3     10-19
#> 3 b72883 2019-07-07         Yes      No Facility  8     30-39
#> 4 c9ee86 2019-07-09          No      No Facility  4     40-49
#> 5 40bc7a 2019-07-12          No      No Facility  6       0-9
#> 6 46566e 2019-07-14         Yes Missing     Unknown       50+
#>   lab_result_01 lab_result_02 lab_result_03 has_symptoms followup
#> 1       unknown          High  Inconclusive      missing  unknown
#> 2  Inconclusive       unknown        Normal          yes      yes
#> 3  Inconclusive        Normal  Inconclusive      missing      yes
#> 4  Inconclusive  Inconclusive       unknown          yes      yes
#> 5        Normal       unknown        Normal      missing       no
#> 6       unknown       unknown  Inconclusive      missing  missing


reconhub/matchmaker documentation built on Feb. 28, 2020, noon