match_complete: Complete Match
In MatthiasUckert/Rmatch: A General Package For Any Matching Tasks

Description Usage Arguments Value Examples

Description

match_complete(
  .source,
  .target,
  .cols_match,
  .cols_join = NULL,
  .cols_exact = NULL,
  .max_match = 10,
  .method = "osa",
  .verbose = TRUE,
  .workers = future::availableCores(),
  .char_block = c(Inf, Inf),
  .standardize = TRUE,
  .w_unique = NULL,
  .w_custom = NULL,
  .min_sim = NULL,
  .col_score = c("sms", "smw", "smc", "sss", "ssw", "ssc")
)

`.source`	The Source Dataframe. (Must contain a unique column id and the columns you want to match on)
`.target`	The Target Dataframe. (Must contain a unique column id and the columns you want to match on)
`.cols_match`	A character vector of columns to perform fuzzy matching.
`.cols_join`	Columns to perfrom an exact match on, before fuzzy-matching. (Matched IDs will be excluded from fuzzy-match)
`.cols_exact`	Columns that must be matched perfectly. (Data will be partitioned using those columns)
`.max_match`	Maximum number of matches to return (Default = 10)
`.method`	One of "osa", "lv", "dl", "hamming", "lcs", "qgram", "cosine", "jaccard", "jw", "soundex". See: stringdist-metrics stringdist
`.verbose`	Print additional information?
`.workers`	Number of cores to utilize (Default all cores determined by future::availableCores())
`.char_block`	Character Block Size. Used to partition data. First element chunks the source data in ngram-blocks. Second element allows for characters in target below/above block size.
`.standardize`	Perform String Standardization using standardize_data()?
`.w_unique`	Weights calculated by get_weights()
`.w_custom`	A named numeric vector that matches the columns of .cols_match w/o the columns of .cols_exact
`.min_sim`	Named vector with minimum similarities
`.col_score`	Score column generated by scores_data(). Options are: sms: Simple Mean (mean over all fuzzy columns) smw: Weighted Mean (mean over all fuzzy columns, weighted by get_weights()) smc: Custom Mean (mean over all fuzzy columns, weighted custom weights) sss: Simple Mean, squared (mean over all fuzzy columns, scores are squared) ssw: Weighted Mean, squared (mean over all fuzzy columns, scores are squared before weighted by get_weights()) ssc: Custom Mean, squared (mean over all fuzzy columns, scores are squared before weighted custom weights)

A dataframe

match_complete(
  .source = table_source[1:100, ],
  .target = table_target[1:999, ],
  .cols_match = c("name", "iso3", "city", "address"),
  .cols_join = c("name", "iso3"),
  .cols_exact = "iso3",
  .max_match = 25,
  .method = "soundex",
  .verbose = TRUE,
  .workers = 4,
  .char_block = c(5, 5),
  .standardize = TRUE,
  .w_unique = NULL,
  .w_custom = c(name = .7, city = .2, address = .1),
  .col_score = "sms"
)