varclust: Variable Clustering with Multiple Latent Components...

Description Details Author(s) References Examples


Variable Clustering with Multiple Latent Components Clustering is based on k-means algorithm. In each step cluster centers are few PCA components, computed for variables in that cluster. The distance is defined by R^2 (obtained by performing least-squares).


The main function of package varclust is mlcc.bic which allows clustering variables in a data with unknown number of clusters. Variable partition is computed with k-means based algorithm. Number of clusters and their dimensions are computed using BIC criterion. If the number of clusters is known one might use function mlcc.reps, which takes number of clusters as a parameter. For mlcc.reps one might specify as well some initial segmentation for k-means algorithm. This can be useful if user has some apriori knowledge about clustering.

We also provide function misclassification that computes misclassification rate between two partitions. This performance measure is extensively used in image segmentation.

Version: 0.9.21


Piotr Sobczyk, Julie Josse

Maintainer: Piotr Sobczyk [email protected]


Piotr Sobczyk, Malgorzata Bogdan, Julie Josse, Clustering around latent variables - a technical report, 2014,


3 <- data.simulation(n=100, SNR=1, K=5, numb.vars=30, max.dim=2)
mlcc.bic($X, numb.clusters=1:5, numb.runs=20)
mlcc.reps($X, numb.clusters=5, numb.runs=20)

psobczyk/public_varclust documentation built on May 24, 2017, 12:20 p.m.