Description Usage Arguments Value Author(s) Examples
This function matches cases between linelists on specified columns using user-specified matching thresholds.
1 2 3 4 5 6 7 8 9 10 11 |
x |
A dataframe containing the columns specified in the first column of
the |
y |
A dataframe containing the columns specified in the second column of
the |
by |
Linelist columns to match cases on. This can be a character vector indicating column names found in both linelists, a 2-column integer matrix indicating the pairs of columns to be matched in linelist 1 and linelist 2, or a 2-column character matrix indicating the names of the columns to be matched in linelist 1 and linelist 2. |
score_fun |
An optional list of functions for customised evaluations of matches. Each function must accept two vectors as arguments and return a numeric vector of the same length indicating the quality of the match. |
rescale |
A logical indicating whether scores for each variable should be rescaled between 0 and 1. |
na_score |
A numeric indicating the score to be assigned to NA
scores. NA handling can also be specified in a variable-specific manner by
providing custom scoring functions to |
output |
If "scores", returns a dataframe of matched scores. If "merged", returns a merged linelist using the matched indices. If "review", returns a dataframe for manual reviewing of matches. |
top_n |
An optional integer indicating the number of matches to keep per
per row of the |
min_score |
An optional numeric indicating the minimum match score required to keep a match. |
Depending on the value of output
, a dataframe containing
either the matching scores, a merged database or the matches for manual
review.
Finlay Campbell (finlaycampbell93@gmail.com)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | data(sample_linelists)
## examine linelists
head(sample_linelists$linelist_a)
head(sample_linelist$linelist_b)
## specify matching columns
by <- matrix(c("numeric_a", "numeric_b",
"character_a", "character_b",
"date_a", "date_b"),
ncol = 2, byrow = TRUE)
## find matching case indices
matches <- match_rows(
sample_linelists$linelist_a,
sample_linelists$linelist_b,
by
)
head(matches)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.