In matchmaker: Flexible Dictionary-Based Cleaning

Example

The matchmaker package has two user-facing functions that perform dictionary-based cleaning:

match_vec() will translate the values in a single vector
match_df() will translate values in all specified columns of a data frame

Each of these functions have four manditory options:

x: your data. This will be a vector or data frame depending on the function.
dictionary: This is a data frame with at least two columns specifying keys and values to modify
from: a character or number specifying which column contains the keys
to: a character or number specifying which column contains the values

Mostly, users will be working with match_df() to transform values across specific columns. A typical workflow would be to:

construct your dictionary in a spreadsheet program based on your data
read in your data and dictionary to data frames in R
match!

library("matchmaker")

# Read in data set
dat <- read.csv(matchmaker_example("coded-data.csv"),
  stringsAsFactors = FALSE
)
dat$date <- as.Date(dat$date)

# Read in dictionary
dict <- read.csv(matchmaker_example("spelling-dictionary.csv"),
  stringsAsFactors = FALSE
)

Data

This is the top of our data set, generated for example purposes

knitr::kable(head(dat))

Dictionary

The dictionary looks like this:

knitr::kable(dict)

Matching

# Clean spelling based on dictionary -----------------------------
cleaned <- match_df(dat,
  dictionary = dict,
  from = "options",
  to = "values",
  by = "grp"
)
head(cleaned)