find_similar: Find match between one person from the first dataset (t1) and...

Description Usage Arguments

Description

Find match between one person from the first dataset (t1) and the whole second dataset (t2)

Usage

1
2
3
find_similar(source, target, row, eq = NULL, eq_tol = NULL, eq_sub = NULL,
  ht = NULL, lt = NULL, hte = NULL, lte = NULL, id = "row_id",
  compare_cols = NULL, keep_most_similar = TRUE, verbose = TRUE)

Arguments

source

Name of the first table

target

Name of the second table

row

Row from the first table for which similar person is searched

eq

Vector of variables which should be equal in t1 and t2

eq_tol

Vector of variables which should be approximately equal in t1 and t2 within specified range (tolerance)

eq_sub

Vector of variables in which value of t1 should be a substring of t2 (optimized for double last names after marriage)

ht

Vector of variables which should be higher in t1

lt

Vector of variables which should be lower in t1

hte

Vector of variables which should be higher or equal in t1

lte

Vector of variables which should be lower or equal in t1

id

Column in source and target datasets containing ID of a row

compare_cols

Columns to be used for comparison to remove duplicates

keep_most_similar

Bool indicating whether entities with the same attributes should be kept or the most similar entity to the original record should be found (and duplicities should be removed)

verbose

Specify if you want to display message for every 250th row


skvrnami/rimr documentation built on June 6, 2019, 3:50 p.m.