Description Usage Arguments Value Author(s) Examples
This function matches cases between linelists on specified columns using user-specified matching thresholds.
1 2 |
x |
Linelist 1 as a dataframe. |
y |
Linelist 2 as a dataframe. |
by |
Linelist columns to match cases on. This can be a character vector indicating column names found in both linelists, a 2-column integer matrix indicating the pairs of columns to be matched in linelist 1 and linelist 2, or a 2-column character matrix indicating the names of the columns to be matched in linelist 1 and linelist 2. |
max_dist |
A numeric vector indicating the cutoff distance for fuzzy
matching of each column-pair. This can be a single value used for all
column-pairs, or a vector of values indicating the cutoff for each
column-pair. Distances between numeric columns are calculated as the
absolute difference between values, distances between Date columns are
calculated as the absolute difference in number of days and distances
between character columns are calculated using the |
match_fun |
An optional list of functions for customised evaluations of matches. Each function must accept two vectors as arguments and return a logical vector of the same length indicating whether a comparison is a match or not. The list must be of the same length as max_dist. |
output |
If "index", returns a dataframe of matched indices between the linelists. If "merged", returns a merged linelist. |
mode |
The type of join when returning a merged linelist. One of "inner", "left", "right", "full", "semi", "anti". |
A dataframe of matching indices if output = "index", a merged linelist if output = "merged".
Finlay Campbell (finlaycampbell93@gmail.com)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | data(sample_linelists)
linelist_a <- sample_linelists$linelist_a
linelist_b <- sample_linelists$linelist_b
## examine linelists
head(linelist_a)
head(linelist_b)
## specify matching columns
by <- matrix(c("numeric_a", "numeric_b",
"character_a", "character_b",
"date_a", "date_b"),
ncol = 2, byrow = TRUE)
## define thresholds
max_dist <- c(5, 1, 5)
## find matching case indices
matches <- match_cases(linelist_a, linelist_b, by, max_dist)
head(matches)
## merge linelists
linelist <- match_cases(linelist_a, linelist_b, by, max_dist, output = "merged")
head(linelist)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.