Example

The matchmaker package has two user-facing functions that perform dictionary-based cleaning:

Each of these functions have four manditory options:

Mostly, users will be working with match_df() to transform values across specific columns. A typical workflow would be to:

  1. construct your dictionary in a spreadsheet program based on your data
  2. read in your data and dictionary to data frames in R
  3. match!
library("matchmaker")

# Read in data set
dat <- read.csv(matchmaker_example("coded-data.csv"),
  stringsAsFactors = FALSE
)
dat$date <- as.Date(dat$date)

# Read in dictionary
dict <- read.csv(matchmaker_example("spelling-dictionary.csv"),
  stringsAsFactors = FALSE
)

Data

This is the top of our data set, generated for example purposes

knitr::kable(head(dat))

Dictionary

The dictionary looks like this:

knitr::kable(dict)

Matching

# Clean spelling based on dictionary -----------------------------
cleaned <- match_df(dat,
  dictionary = dict,
  from = "options",
  to = "values",
  by = "grp"
)
head(cleaned)


Try the matchmaker package in your browser

Any scripts or data that you put into this service are public.

matchmaker documentation built on Feb. 22, 2020, 1:11 a.m.