gammaCKpar: gammaCKpar

View source: R/gammaCKpar.R

gammaCKparR Documentation

gammaCKpar

Description

Field comparisons for string variables. Three possible agreement patterns are considered: 0 total disagreement, 1 partial agreement, 2 agreement. The distance between strings is calculated using a Jaro-Winkler distance.

Usage

gammaCKpar(matAp, matBp, n.cores, cut.a, cut.p, method, w)

Arguments

matAp

vector storing the comparison field in data set 1

matBp

vector storing the comparison field in data set 2

n.cores

Number of cores to parallelize over. Default is NULL.

cut.a

Lower bound for full match, ranging between 0 and 1. Default is 0.92

cut.p

Lower bound for partial match, ranging between 0 and 1. Default is 0.88

method

String distance method, options are: "jw" Jaro-Winkler (Default), "dl" Damerau-Levenshtein, "jaro" Jaro, and "lv" Edit

w

Parameter that describes the importance of the first characters of a string (only needed if method = "jw"). Default is .10

Value

gammaCKpar returns a list with the indices corresponding to each matching pattern, which can be fed directly into tableCounts and matchesLink.

Author(s)

Ted Enamorado <ted.enamorado@gmail.com>, Ben Fifield <benfifield@gmail.com>, and Kosuke Imai

Examples

## Not run: 
g1 <- gammaCKpar(dfA$firstname, dfB$lastname)

## End(Not run)


kosukeimai/fastLink documentation built on Nov. 17, 2023, 8:11 p.m.