reclin_pair_blocking: creates a pair blocking based on columns passed into...

reclin_pair_blockingR Documentation

creates a pair blocking based on columns passed into blocking_var, then then it generates more pairs based on token, that are the same with columns col_nms_x and col_nms_y.

Description

creates a pair blocking based on columns passed into blocking_var, then then it generates more pairs based on token, that are the same with columns col_nms_x and col_nms_y.

Usage

reclin_pair_blocking(
  x,
  y,
  blocking_var,
  token_types,
  large = FALSE,
  add_xy = TRUE,
  chunk_size = 1e+07,
  col_nms_x = colnames(select_if(x, is_character)),
  col_nms_y = colnames(select_if(y, is_character)),
  min_token_u_prob = TOKEN_MIN_UPROB_DEFAULT,
  ...
)

Arguments

x

dataframe

y

dataframe

blocking_var

vector of column names to block on, unlike reclin these columns are joined with 'or'

token_types

vector of token types

large

Passed to reclin::pair_blocking. Default FALSE

add_xy

Boolean Default True. This is passed to reclin::pair_blocking, inside map_dfr

chunk_size

passed to reclin::pair_blocking. Default 1E+07

col_nms_x

passed to token_links

col_nms_y

passed to token_links

min_token_u_prob

passed to get_paired_row_names

...

passed to token_links


csps-efpc/TokenLink documentation built on Feb. 10, 2023, 3:30 a.m.