Description Usage Arguments Value Author(s) Examples

Creates properly sized clusters for matching, using either
alphabetical or word embedding clustering. If using word embedding,
the function first creates a word embedding out of the provided
vectors, and then runs PCA on the matrix. It then takes the first
`k`

dimensions (where `k`

is provided by the user) and
k-means is run on that matrix to get the clusters.

1 2 | ```
clusterMatch(vecA, vecB, nclusters, max.n, word.embed, min.var,
weighted.kmeans, iter.max)
``` |

`vecA` |
The character vector from dataset A |

`vecB` |
The character vector from dataset B |

`nclusters` |
The number of clusters to create from the provided data. Either nclusters = NULL or max.n = NULL. |

`max.n` |
The maximum size of either dataset A or dataset B in the largest cluster. Either nclusters = NULL or max.n = NULL |

`word.embed` |
Whether to use word embedding clustering. Default is FALSE. |

`min.var` |
The minimum amount of explained variance (maximum = 1) a PCA dimension can provide in order to be included in k-means clustering when using word embedding. Default is .20. |

`weighted.kmeans` |
Whether to weight the k-means algorithm features by the explained variance of the included principal component when using word embedding clustering. Default is FALSE. |

`iter.max` |
Maximum number of iterations for the k-means algorithm. |

`clusterMatch`

returns a list of length 3:

`clusterA` |
The cluster assignments for dataset A |

`clusterB` |
The cluster assignments for dataset B |

`n.clusters` |
The number of clusters created |

`kmeans` |
The k-means object output. |

`pca` |
The PCA object output. |

`dims.pca` |
The number of dimensions from PCA used for the k-means clustering. |

Ben Fifield <[email protected]>

1 2 | ```
data(samplematch)
cl <- clusterMatch(dfA$firstname, dfB$firstname, nclusters = 3)
``` |

fastLink documentation built on Nov. 12, 2018, 5:05 p.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.