VarSelLCM-package: Variable Selection in model-based clustering managed by the...

Description Details Author(s) References Examples

Description

The package uses a finite mixture model for analyzing mixed-type data (data with continuous and/or count and/or categorical variables) with missing values (missing at random) by assuming independence between classes. The one-dimensional marginals of the components follow standard distributions for facilitating both the model interpretation and the model selection. The variable selection is led by an alternated optimization procedure for maximizing the MICL criterion. The maximum likelihood inference is done by an EM algorithm for the selected model. This package also performs the imputation of missing values.

Details

Package: VarSelLCM
Type: Package
Version: 2.0.0
Date: 2016-04-18
License: GPL-2
LazyLoad: yes
URL: http://varsellcm.r-forge.r-project.org/

The main function to use is VarSelCluster.

Function VarSelCluster carries out the model selection by maximizing the MICL criterion, then it performs the maximum likelihood estimation of the selected model via an EM algorithm.

Tool methods summary, print and plot are available for facilitating the interpretation.

Author(s)

Matthieu Marbac and Mohammed Sedki Maintainer: Mohammed Sedki <[email protected]>

References

M. Marbac and M. Sedki (2015). Variable selection for model-based clustering using the integrated completed-data likelihood. Preprint.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
## Not run: 
# Package loading
require(VarSelLCM)

# Data loading:
# x contains the observed variables
# z the known statu (i.e. 1: absence and 2: presence of heart disease)
data(heart)
z <- heart[,"Class"]
x <- heart[,-13]

# Cluster analysis without variable selection
res_without <- VarSelCluster(x, 2, vbleSelec = FALSE)

# Cluster analysis with variable selection (with parallelisation)
res_with <- VarSelCluster(x, 2, nbcores = 2, initModel=40)

# Confusion matrices: variable selection decreases the misclassification error rate
print(table(z, res_without@partitions@zMAP))
print(table(z, res_with@partitions@zMAP))

# Summary of the best model
summary(res_with)

# Parameters of the best model
print(res_with)

# Plot of the best model
plot(res_with)


## End(Not run)

VarSelLCM documentation built on Nov. 17, 2017, 7:51 a.m.