clusterMatch: clusterMatch
In kosukeimai/fastLink: Fast Probabilistic Record Linkage with Missing Data

clusterMatch

R Documentation

clusterMatch

Description

Creates properly sized clusters for matching, using either alphabetical or word embedding clustering. If using word embedding, the function first creates a word embedding out of the provided vectors, and then runs PCA on the matrix. It then takes the first k dimensions (where k is provided by the user) and k-means is run on that matrix to get the clusters.

Usage

clusterMatch(vecA, vecB, nclusters, max.n, word.embed, min.var, iter.max)

Arguments

`vecA`	The character vector from dataset A
`vecB`	The character vector from dataset B
`nclusters`	The number of clusters to create from the provided data. Either nclusters = NULL or max.n = NULL.
`max.n`	The maximum size of either dataset A or dataset B in the largest cluster. Either nclusters = NULL or max.n = NULL
`word.embed`	Whether to use word embedding clustering. Default is FALSE.
`min.var`	The minimum amount of explained variance (maximum = 1) a PCA dimension can provide in order to be included in k-means clustering when using word embedding. Default is .20.
`iter.max`	Maximum number of iterations for the k-means algorithm.

Value

clusterMatch returns a list of length 3:

`clusterA`	The cluster assignments for dataset A
`clusterB`	The cluster assignments for dataset B
`n.clusters`	The number of clusters created
`kmeans`	The k-means object output.
`pca`	The PCA object output.
`dims.pca`	The number of dimensions from PCA used for the k-means clustering.

Author(s)

Ben Fifield <benfifield@gmail.com>

Examples

data(samplematch)
cl <- clusterMatch(dfA$firstname, dfB$firstname, nclusters = 3)

kosukeimai/fastLink documentation built on Nov. 17, 2023, 8:11 p.m.

kosukeimai/fastLink index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

kosukeimai/fastLink
Fast Probabilistic Record Linkage with Missing Data

clusterMatch: clusterMatch
In kosukeimai/fastLink: Fast Probabilistic Record Linkage with Missing Data

clusterMatch

Description

Usage

Arguments

Value

Author(s)

Examples

Related to clusterMatch in kosukeimai/fastLink...

R Package Documentation

Browse R Packages

We want your feedback!

kosukeimai/fastLink Fast Probabilistic Record Linkage with Missing Data

clusterMatch: clusterMatch In kosukeimai/fastLink: Fast Probabilistic Record Linkage with Missing Data

clusterMatch

Description

Usage

Arguments

Value

Author(s)

Examples

Related to clusterMatch in kosukeimai/fastLink...

R Package Documentation

Browse R Packages

We want your feedback!

kosukeimai/fastLink
Fast Probabilistic Record Linkage with Missing Data

clusterMatch: clusterMatch
In kosukeimai/fastLink: Fast Probabilistic Record Linkage with Missing Data