Variable Selection in model-based clustering managed by the Latent Class Model for analyzing continuous data with missing values.

Share:

Description

The R package VarSelLCM uses a finite mixture model for performing the cluster analysis with variable selection of continuous data by assuming independence between classes. The package deals dataset with missing values by assuming that values are missing at random. The one-dimensional marginals of the components follow Gaussian distributions for facilitating both model interpretation and model selection. The variable selection is led by an alternated optimization procedure for maximizing the MICL criterion. The maximum likelihood inference is done by an EM algorithm for the selected model. This package also performs the imputation of missing values.

Details

Package: VarSelLCM
Type: Package
Version: 1.2
Date: 2015-06-08
License: GPL (>= 2)

The main functions to use are VarSelCluster and VarSelImputation.

Function VarSelCluster carries out the model selection by maximizing the MICL criterion, then it performs the maximum likelihood estimation of the selected model via an EM algorithm.

Function VarSelImputation performs the imputation of missing values by taking the expectation of the missing values conditionally on the model, its parameters and on the observed variables.

Tool methods summary and print are available for facilitating the interpretation.

Author(s)

Matthieu Marbac and Mohammed Sedki Maintainer: Mohammed Sedki <mohammed.sedki@u-psud.fr>

References

M. Marbac and M. Sedki (2015). Variable Selection for Model-Based Clustering using the Integrated Completed-Data Likelihood. Preprint.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
## Not run: 
require(VarSelLCM)
data(banknote)

results <- VarSelCluster(banknote[,-1], 2, nbcores=2, initModel=40)

summary(results)

print(results)

## End(Not run)