pair_blocking: Generate pairs using simple blocking

Description Usage Arguments Details Value Examples

View source: R/pair_blocking.R

Description

Generates all combinations of records from x and y where the blocking variables are equal.

Usage

1
2
3
4
5
6
7
8
pair_blocking(
  x,
  y,
  blocking_var = NULL,
  large = TRUE,
  add_xy = TRUE,
  chunk_size = 1e+07
)

Arguments

x

first data.frame

y

second data.frame

blocking_var

the variables defining the blocks or strata for which all pairs of x and y will be generated.

large

should the pairs be returned as a ldat object.

add_xy

add x and y as attributes to the returned pairs. This makes calling some subsequent operations that need x and y (such as compare_pairs easier.

chunk_size

used when large = TRUE to specify the approximate number of pairs that are kept in memory.

Details

Generating (all) pairs of the records of two data sets, is usually the first step when linking the two data sets. However, this often results in a too large number of records. Therefore, blocking is usually applied.

Value

When large is FALSE, a data.frame with two columns, x and y, is returned. Columns x and y are row numbers from data.frames x and y respectively. When large is TRUE, an object of type ldat is returned.

Examples

1
2
data("linkexample1", "linkexample2")
pairs <- pair_blocking(linkexample1, linkexample2, "postcode")

reclin documentation built on Nov. 23, 2021, 9:09 a.m.