knitr::opts_chunk$set(collapse = T, comment = "#>")
library(laycUtils)

The fuzzy_join() function merges datasets that don't have any common id to merge them on. For instance, two different datasets can contain both first name, and last name information, but due to typos, discrepancies will exist between the two datasets.

STEP 1: Load the data sets to be merged

data(eto)
head(eto)

data(nwea)
head(nwea)

Both data sets contains first name and last name. We will create a custom id variable based on names.

STEP 2: Create custom id variable

eto$my_id <- create_id(eto, var = c('lname', 'fname'))
head(eto)

nwea$my_id <- create_id(nwea, var = c('StudentLastName', 'StudentFirstName'))
head(nwea)

STEP 3: Merge both datasets

df <- fuzzy_join(x = nwea, y = eto, by = 'my_id')

head(df)

Both data sets have been merged. A new variable match_status identifies whether the match was perfect, partial, or if the record was unmatched

"Thanks @psychemedia for most of the code" (via)



thelayc/laycUtils documentation built on May 31, 2019, 9:17 a.m.