View source: R/hmatch_composite.R
hmatch_composite | R Documentation |
Match a data frame with raw, potentially messy hierarchical data (e.g. province, county, township) against a reference dataset, using a variety of matching strategies implemented in sequence to identify the best-possible match (i.e. highest-resolution) for each row.
The sequence of matching strategies is:
(optional) manually-specified matching with hmatch_manual
complete matching with hmatch(..., allow_gaps = FALSE)
partial matching with hmatch(..., allow_gaps = TRUE)
fuzzy partial matching with hmatch(allow_gaps = TRUE, fuzzy = TRUE)
best-possible matching with hmatch_settle
Each approach is implement only on the rows of data for which a single match has not already been identified using the previous approaches.
hmatch_composite(
raw,
ref,
man,
pattern,
pattern_ref = pattern,
by,
by_ref = by,
code_col,
type = "resolve_left",
allow_gaps = TRUE,
fuzzy = FALSE,
fuzzy_method = "osa",
fuzzy_dist = 1L,
dict = NULL,
ref_prefix = "ref_",
std_fn = string_std,
...
)
raw |
data frame containing hierarchical columns with raw data |
ref |
data frame containing hierarchical columns with reference data |
man |
(optional) data frame of manually-specified matches, relating a
given set of hierarchical values to the code within |
pattern |
regex pattern to match the hierarchical columns in |
pattern_ref |
regex pattern to match the hierarchical columns in |
by |
vector giving the names of the hierarchical columns in |
by_ref |
vector giving the names of the hierarchical columns in |
code_col |
name of the code column containing codes for matching |
type |
type of join ("resolve_left", "resolve_inner", or "resolve_anti"). Defaults to "left". See join_types. |
allow_gaps |
logical indicating whether to allow missing values below
the match level, where 'match level' is the highest level with a
non-missing value within a given row of |
fuzzy |
logical indicating whether to use fuzzy-matching (based on the
|
fuzzy_method |
if |
fuzzy_dist |
if |
dict |
optional dictionary for recoding values within the hierarchical
columns of |
ref_prefix |
prefix to add to names of returned columns from |
std_fn |
function to standardize strings during matching. Defaults to
|
... |
additional arguments passed to |
a data frame obtained by matching the hierarchical columns in raw
and ref
, using the join type specified by argument type
(see
join_types for more details)
data(ne_raw)
data(ne_ref)
hmatch_composite(ne_raw, ne_ref, fuzzy = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.