match_data: Match Data

Description Usage Arguments Value Examples

View source: R/match_data.R

Description

Description

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
match_data(
  .source,
  .target,
  .cols_match,
  .cols_join = NULL,
  .cols_exact = NULL,
  .max_match = 10,
  .method = "osa",
  .verbose = TRUE,
  .workers = future::availableCores(),
  .char_block = c(Inf, Inf)
)

Arguments

.source

The Source Dataframe.
(Must contain a unique column id and the columns you want to match on)

.target

The Target Dataframe.
(Must contain a unique column id and the columns you want to match on)

.cols_match

A character vector of columns to perform fuzzy matching.

.cols_join

Columns to perfrom an exact match on, before fuzzy-matching.
(Matched IDs will be excluded from fuzzy-match)

.cols_exact

Columns that must be matched perfectly.
(Data will be partitioned using those columns)

.max_match

Maximum number of matches to return (Default = 10)

.method

One of "osa", "lv", "dl", "hamming", "lcs", "qgram", "cosine", "jaccard", "jw", "soundex".
See: stringdist-metrics stringdist

.verbose

Print additional information?

.workers

Number of cores to utilize (Default all cores determined by future::availableCores())

.char_block

Character Block Size. Used to partition data.

  • First element chunks the source data in ngram-blocks.

  • Second element allows for characters in target below/above block size.

Value

A dataframe

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
tab_source <- table_source[1:100, ]
tab_target <- table_target[1:999, ]
cols_match <- c("name", "iso3", "city", "address")
cols_join  <- c("name", "iso3")
cols_exact <- "iso3"

match_data(
  .source = tab_source,
  .target = tab_target,
  .cols_match = cols_match,
  .cols_join = cols_join,
  .cols_exact = cols_exact
)

MatthiasUckert/Rmatch documentation built on Jan. 3, 2022, 11:09 p.m.