getPatterns: getPatterns

View source: R/getPatterns.R

getPatternsR Documentation

getPatterns

Description

Get the full matching patterns for all matched pairs in dataset A and dataset B

Usage

getPatterns(
  matchesA,
  matchesB,
  varnames,
  stringdist.match,
  numeric.match,
  partial.match,
  stringdist.method = "jw",
  cut.a = 0.92,
  cut.p = 0.88,
  jw.weight = 0.1,
  cut.a.num = 1,
  cut.p.num = 2.5
)

Arguments

matchesA

A dataframe of the matched observations in dataset A, with all variables used to inform the match.

matchesB

A dataframe of the matched observations in dataset B, with all variables used to inform the match.

varnames

A vector of variable names to use for matching. Must be present in both matchesA and matchesB.

stringdist.match

A vector of booleans, indicating whether to use string distance matching when determining matching patterns on each variable. Must be same length as varnames.

numeric.match

A vector of booleans, indicating whether to use numeric pairwise distance matching when determining matching patterns on each variable. Must be same length as varnames.

partial.match

A vector of booleans, indicating whether to include a partial matching category for the string distances. Must be same length as varnames. Default is FALSE for all variables.

stringdist.method

String distance method for calculating similarity, options are: "jw" Jaro-Winkler (Default), "jaro" Jaro, and "lv" Edit

cut.a

Lower bound for full string-distance match, ranging between 0 and 1. Default is 0.92

cut.p

Lower bound for partial string-distance match, ranging between 0 and 1. Default is 0.88

jw.weight

Parameter that describes the importance of the first characters of a string (only needed if stringdist.method = "jw"). Default is .10

cut.a.num

Lower bound for full numeric match. Default is 1

cut.p.num

Lower bound for partial numeric match. Default is 2.5

Value

getPatterns() returns a dataframe with a row for each matched pair, where each column indicates the matching pattern for each matching variable.

Author(s)

Ted Enamorado <ted.enamorado@gmail.com> and Ben Fifield <benfifield@gmail.com>


fastLink documentation built on Nov. 17, 2023, 9:06 a.m.