Description Details Author(s) Examples

Package varclust performs clustering of variables, according to a probabilistic model, which assumes that each cluster lies in a low dimensional subspace. Segmentation of variables, number of clusters and their dimensions are selected based on the appropriate implementation of the Bayesian Information Criterion.

The best candidate models are identified by the specific implementation of K-means algorithm, in which cluster centers are represented by some number of orthogonal factors(principal components of the variables within a cluster) and similarity between a given variable and a cluster center depends on residuals from a linear model fit. Based on the Bayesian Information Criterion (BIC), sums of squares of residuals are appropriately scaled, which allows to avoid an over-excessive attraction by clusters with larger dimensions. To reduce the chance that the local minimum of modified BIC (mBIC) is obtained instead of the global one, for every fixed number of clusters in a given range K-means algorithm is run large number of times, with different random initializations of cluster centers.

The main function of package varclust is `mlcc.bic`

which
allows clustering variables in a data with unknown number of clusters.
Variable partition is computed with k-means based algorithm. Number of
clusters and their dimensions are estimated using mBIC and PESEL
respectively. If the number of clusters is known one might use function
`mlcc.reps`

, which takes number of clusters as a parameter. For
`mlcc.reps`

one might specify as well some initial segmentation
for k-means algorithm. This can be useful if user has some a priori knowledge
about clustering.

We provide also two functions to simulate datasets with described structure.
The function `data.simulation`

generates the data so that the
subspaces are indepentend and `data.simulation.factors`

generates
the data where some factores are shared between the subspaces.

We also provide function measures of quality of clustering.
`misclassification`

computes misclassification rate between two
partitions. This performance measure is extensively used in image
segmentation. The other measure is implemented as `integration`

function.

Version: 0.9.4

Piotr Sobczyk, Stanislaw Wilczynski, Julie Josse, Malgorzata Bogdan

Maintainer: Piotr Sobczyk pj.sobczyk@gmail.com

1 2 3 | ```
sim.data <- data.simulation(n = 50, SNR = 1, K = 3, numb.vars = 50, max.dim = 3)
mlcc.bic(sim.data$X, numb.clusters = 1:5, numb.runs = 20, numb.cores = 1, verbose = TRUE)
mlcc.reps(sim.data$X, numb.clusters = 3, numb.runs = 20, numb.cores = 1)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.