SelvarClustLasso: Regularization for variable selection in model-based...

Description Usage Arguments Value Author(s) References See Also Examples

Description

This function implements the variable selection in model-based clustering using a lasso ranking on the variables as described in Sedki et al (2014). The variable ranking step uses the penalized EM algorithm of Zhou et al (2009).

Usage

1
2
SelvarClustLasso(data, nbCluster, lambda, rho, hybrid.size, criterion, 
                 models, regModel, indepModel, nbCores)

Arguments

data

matrix containing quantitative data. Rows correspond to observations and columns correspond to variables

nbCluster

numeric listing of the number of clusters (must be positive integers)

lambda

numeric listing of the tuning parameter for \ell_1 mean penalty

rho

numeric listing of the tuning parameter for \ell_1 precision matrix penalty

hybrid.size

optional parameter make less strength the hybrid forward and backward algorithms to select S and W sets

criterion

list of character defining the criterion to select the best model. The best model is the one with the highest criterion value. Possible values: "BIC", "ICL", c("BIC", "ICL"). Default is "BIC"

models

a Rmixmod [Model] object defining the list of models to run. The models Gaussian_pk_L_C, Gaussian_pk_Lk_C, Gaussian_pk_L_Ck, and Gaussian_pk_Lk_Ck are called by default (see mixmodGaussianModel() in Rmixmod package to specify other models)

regModel

list of character defining the covariance matrix form for the linear regression of U on the R set of variables. Possible values: "LI" for spherical form, "LB" for diagonal form and "LC" for general form. Possible values: "LI", "LB", "LC", c("LI", "LB"), c("LI", "LC"), c("LB", "LC") and c("LI", "LB", "LC"). Default is c("LI", "LB", "LC")

indepModel

list of character defining the covariance matrix form for independent variables W. Possible values: "LI" for spherical form and "LB" for diagonal form. Possible values: "LI", "LB", c("LI", "LB"). Default is c("LI", LB")

nbCores

number of CPUs to be used when parallel computing is utilized (default is 2)

Value

for each criterion BIC or ICL

S

The selected set of relevant clustering variables

R

The selected subset of regressors

U

The selected set of redundant variables

W

The selected set of independent variables

criterionValue

The criterion value for the selected model

nbCluster

The selected number of clusters

model

The selected Gaussian mixture form

regModel

The selected covariance form for the regression

indepModel

The selected covariance form for the independent gaussian distribution

proba

Matrix containing the conditional probabilities of belonging to each cluster for all observations

partition

Vector of length n containing the cluster assignments of the n observations according to the Maximum-a-Posteriori rule

Author(s)

Mohammed Sedki <mohammed.sedki@u-psud.fr>

References

Zhou, H., Pan, W., and Shen, X., 2009. "Penalized model-based clustering with unconstrained covariance matrices". Electronic Journal of Statistics, vol. 3, pp.1473-1496.

Maugis, C., Celeux, G., and Martin-Magniette, M. L., 2009. "Variable selection in model-based clustering: A general variable role modeling". Computational Statistics and Data Analysis, vol. 53/11, pp. 3872-3882.

Sedki, M., Celeux, G., Maugis-Rabusseau, C., 2014. "SelvarMix: A R package for variable selection in model-based clustering and discriminant analysis with a regularization approach". Inria Research Report available at http://hal.inria.fr/hal-01053784

See Also

SelvarLearnLasso SortvarClust SortvarLearn scenarioCor

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
## Not run: 
## Simulated data  example as shown in Maugis et al. (2009) 
## n = 2000 observations, p = 14 variables 
require(Rmixmod)
require(glasso)
data(scenarioCor)
data.cor <- scenarioCor[,1:14]

lambda <- seq(20,  100, by = 10)
rho <- seq(1, 2, length=2)
hybrid.size <- 3
nbCluster <-  c(3,4)
criterion <- "BIC"
models <- mixmodGaussianModel(family = "spherical", equal.proportions = TRUE)
regModel <- c("LI","LB","LC")
indepModel <- c("LI","LB")


simulate.cl <- SelvarClustLasso(data.cor, nbCluster, lambda, rho, hybrid.size, 
                                criterion, models, regModel, indepModel)
 

## End(Not run) 

masedki/SelvarMix documentation built on May 21, 2019, 12:42 p.m.