blockData: blockData
In fastLink: Fast Probabilistic Record Linkage with Missing Data

blockData

R Documentation

blockData

Description

Contains functionalities for blocking two data sets on one or more variables prior to conducting a merge.

Usage

blockData(dfA, dfB, varnames, window.block, window.size,
kmeans.block, nclusters, iter.max, n.cores)

Arguments

`dfA`	Dataset A - to be matched to Dataset B
`dfB`	Dataset B - to be matched to Dataset A
`varnames`	A vector of variable names to use for blocking. Must be present in both dfA and dfB
`window.block`	A vector of variable names indicating that the variable should be blocked using windowing blocking. Must be present in varnames.
`window.size`	The size of the window for window blocking. Default is 1 (observations +/- 1 on the specified variable will be blocked together).
`kmeans.block`	A vector of variable names indicating that the variable should be blocked using k-means blocking. Must be present in varnames.
`nclusters`	Number of clusters to create with k-means. Default value is the number of clusters where the average cluster size is 100,000 observations.
`iter.max`	Maximum number of iterations for the k-means algorithm to run. Default is 5000
`n.cores`	Number of cores to parallelize over. Default is NULL.

Value

A list with an entry for each block. Each list entry contains two vectors — one with the indices indicating the block members in dataset A, and another containing the indices indicating the block members in dataset B.

Examples

## Not run: 
block_out <- blockData(dfA, dfB, varnames = c("city", "birthyear"))

## End(Not run)

fastLink documentation built on Nov. 17, 2023, 9:06 a.m.

fastLink index

Package overview

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

fastLink
Fast Probabilistic Record Linkage with Missing Data

blockData: blockData
In fastLink: Fast Probabilistic Record Linkage with Missing Data

blockData

Description

Usage

Arguments

Value

Examples

Related to blockData in fastLink...

R Package Documentation

Browse R Packages

We want your feedback!

fastLink Fast Probabilistic Record Linkage with Missing Data

blockData: blockData In fastLink: Fast Probabilistic Record Linkage with Missing Data

blockData

Description

Usage

Arguments

Value

Examples

Related to blockData in fastLink...

R Package Documentation

Browse R Packages

We want your feedback!

fastLink
Fast Probabilistic Record Linkage with Missing Data

blockData: blockData
In fastLink: Fast Probabilistic Record Linkage with Missing Data