clusterMatch: clusterMatch
In fastLink: Fast Probabilistic Record Linkage with Missing Data

clusterMatch

R Documentation

clusterMatch

Description

Creates properly sized clusters for matching, using either alphabetical or word embedding clustering. If using word embedding, the function first creates a word embedding out of the provided vectors, and then runs PCA on the matrix. It then takes the first k dimensions (where k is provided by the user) and k-means is run on that matrix to get the clusters.

Usage

clusterMatch(vecA, vecB, nclusters, max.n, word.embed, min.var, iter.max)

Arguments

`vecA`	The character vector from dataset A
`vecB`	The character vector from dataset B
`nclusters`	The number of clusters to create from the provided data. Either nclusters = NULL or max.n = NULL.
`max.n`	The maximum size of either dataset A or dataset B in the largest cluster. Either nclusters = NULL or max.n = NULL
`word.embed`	Whether to use word embedding clustering. Default is FALSE.
`min.var`	The minimum amount of explained variance (maximum = 1) a PCA dimension can provide in order to be included in k-means clustering when using word embedding. Default is .20.
`iter.max`	Maximum number of iterations for the k-means algorithm.

Value

clusterMatch returns a list of length 3:

`clusterA`	The cluster assignments for dataset A
`clusterB`	The cluster assignments for dataset B
`n.clusters`	The number of clusters created
`kmeans`	The k-means object output.
`pca`	The PCA object output.
`dims.pca`	The number of dimensions from PCA used for the k-means clustering.

Author(s)

Ben Fifield <benfifield@gmail.com>

Examples

data(samplematch)
cl <- clusterMatch(dfA$firstname, dfB$firstname, nclusters = 3)

fastLink documentation built on Nov. 17, 2023, 9:06 a.m.

fastLink index

Package overview

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

fastLink
Fast Probabilistic Record Linkage with Missing Data

clusterMatch: clusterMatch
In fastLink: Fast Probabilistic Record Linkage with Missing Data

clusterMatch

Description

Usage

Arguments

Value

Author(s)

Examples

Related to clusterMatch in fastLink...

R Package Documentation

Browse R Packages

We want your feedback!

fastLink Fast Probabilistic Record Linkage with Missing Data

clusterMatch: clusterMatch In fastLink: Fast Probabilistic Record Linkage with Missing Data

clusterMatch

Description

Usage

Arguments

Value

Author(s)

Examples

Related to clusterMatch in fastLink...

R Package Documentation

Browse R Packages

We want your feedback!

fastLink
Fast Probabilistic Record Linkage with Missing Data

clusterMatch: clusterMatch
In fastLink: Fast Probabilistic Record Linkage with Missing Data