Description Usage Arguments Value Author(s) References See Also Examples
This function implements the variable selection in model-based clustering using a lasso ranking on the variables as described in Sedki et al (2014). The variable ranking step uses the penalized EM algorithm of Zhou et al (2009).
1 2 | SelvarClustLasso(data, nbCluster, lambda, rho, hybrid.size, criterion,
models, regModel, indepModel, nbCores)
|
data |
matrix containing quantitative data. Rows correspond to observations and columns correspond to variables |
nbCluster |
numeric listing of the number of clusters (must be positive integers) |
lambda |
numeric listing of the tuning parameter for \ell_1 mean penalty |
rho |
numeric listing of the tuning parameter for \ell_1 precision matrix penalty |
hybrid.size |
optional parameter make less strength the hybrid forward and backward algorithms to select S and W sets |
criterion |
list of character defining the criterion to select the best model. The best model is the one with the highest criterion value. Possible values: "BIC", "ICL", c("BIC", "ICL"). Default is "BIC" |
models |
a Rmixmod [ |
regModel |
list of character defining the covariance matrix form for the linear regression of U on the R set of variables. Possible values: "LI" for spherical form, "LB" for diagonal form and "LC" for general form. Possible values: "LI", "LB", "LC", c("LI", "LB"), c("LI", "LC"), c("LB", "LC") and c("LI", "LB", "LC"). Default is c("LI", "LB", "LC") |
indepModel |
list of character defining the covariance matrix form for independent variables W. Possible values: "LI" for spherical form and "LB" for diagonal form. Possible values: "LI", "LB", c("LI", "LB"). Default is c("LI", LB") |
nbCores |
number of CPUs to be used when parallel computing is utilized (default is 2) |
for each criterion BIC or ICL
S |
The selected set of relevant clustering variables |
R |
The selected subset of regressors |
U |
The selected set of redundant variables |
W |
The selected set of independent variables |
criterionValue |
The criterion value for the selected model |
nbCluster |
The selected number of clusters |
model |
The selected Gaussian mixture form |
regModel |
The selected covariance form for the regression |
indepModel |
The selected covariance form for the independent gaussian distribution |
proba |
Matrix containing the conditional probabilities of belonging to each cluster for all observations |
partition |
Vector of length n containing the cluster assignments of the n observations according to the Maximum-a-Posteriori rule |
Mohammed Sedki <mohammed.sedki@u-psud.fr>
Zhou, H., Pan, W., and Shen, X., 2009. "Penalized model-based clustering with unconstrained covariance matrices". Electronic Journal of Statistics, vol. 3, pp.1473-1496.
Maugis, C., Celeux, G., and Martin-Magniette, M. L., 2009. "Variable selection in model-based clustering: A general variable role modeling". Computational Statistics and Data Analysis, vol. 53/11, pp. 3872-3882.
Sedki, M., Celeux, G., Maugis-Rabusseau, C., 2014. "SelvarMix: A R package for variable selection in model-based clustering and discriminant analysis with a regularization approach". Inria Research Report available at http://hal.inria.fr/hal-01053784
SelvarLearnLasso SortvarClust SortvarLearn scenarioCor
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | ## Not run:
## Simulated data example as shown in Maugis et al. (2009)
## n = 2000 observations, p = 14 variables
require(Rmixmod)
require(glasso)
data(scenarioCor)
data.cor <- scenarioCor[,1:14]
lambda <- seq(20, 100, by = 10)
rho <- seq(1, 2, length=2)
hybrid.size <- 3
nbCluster <- c(3,4)
criterion <- "BIC"
models <- mixmodGaussianModel(family = "spherical", equal.proportions = TRUE)
regModel <- c("LI","LB","LC")
indepModel <- c("LI","LB")
simulate.cl <- SelvarClustLasso(data.cor, nbCluster, lambda, rho, hybrid.size,
criterion, models, regModel, indepModel)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.