# Regularization for variable selection in model-based clustering

### Description

This function implements the variable selection in model-based clustering using a lasso ranking on the variables as described in Sedki et al (2014). The variable ranking step uses the penalized EM algorithm of Zhou et al (2009).

### Usage

1 2 | ```
SelvarClustLasso(data, nbCluster, lambda, rho, hybrid.size, criterion,
models, regModel, indepModel, nbCores)
``` |

### Arguments

`data` |
matrix containing quantitative data. Rows correspond to observations and columns correspond to variables |

`nbCluster` |
numeric listing of the number of clusters (must be positive integers) |

`lambda` |
numeric listing of the tuning parameter for |

`rho` |
numeric listing of the tuning parameter for |

`hybrid.size` |
optional parameter make less strength the hybrid forward and backward
algorithms to select |

`criterion` |
list of character defining the criterion to select the best model. The best model is the one with the highest criterion value. Possible values: "BIC", "ICL", c("BIC", "ICL"). Default is "BIC" |

`models` |
a Rmixmod [ |

`regModel` |
list of character defining the covariance matrix form for
the linear regression of |

`indepModel` |
list of character defining the covariance matrix form for
independent variables |

`nbCores` |
number of CPUs to be used when parallel computing is utilized (default is 2) |

### Value

for each criterion BIC or ICL

`S ` |
The selected set of relevant clustering variables |

`R ` |
The selected subset of regressors |

`U ` |
The selected set of redundant variables |

`W ` |
The selected set of independent variables |

`criterionValue` |
The criterion value for the selected model |

`nbCluster` |
The selected number of clusters |

`model` |
The selected Gaussian mixture form |

`regModel ` |
The selected covariance form for the regression |

`indepModel` |
The selected covariance form for the independent gaussian distribution |

`proba ` |
Matrix containing the conditional probabilities of belonging to each cluster for all observations |

`partition` |
Vector of length |

### Author(s)

Mohammed Sedki <mohammed.sedki@u-psud.fr>

### References

Zhou, H., Pan, W., and Shen, X., 2009. "Penalized model-based clustering with unconstrained covariance matrices". Electronic Journal of Statistics, vol. 3, pp.1473-1496.

Maugis, C., Celeux, G., and Martin-Magniette, M. L., 2009. "Variable selection in model-based clustering: A general variable role modeling". Computational Statistics and Data Analysis, vol. 53/11, pp. 3872-3882.

Sedki, M., Celeux, G., Maugis-Rabusseau, C., 2014. "SelvarMix: A R package for variable selection in model-based clustering and discriminant analysis with a regularization approach". Inria Research Report available at http://hal.inria.fr/hal-01053784

### See Also

SelvarLearnLasso SortvarClust SortvarLearn scenarioCor

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | ```
## Not run:
## Simulated data example as shown in Maugis et al. (2009)
## n = 2000 observations, p = 14 variables
require(Rmixmod)
require(glasso)
data(scenarioCor)
data.cor <- scenarioCor[,1:14]
lambda <- seq(20, 100, by = 10)
rho <- seq(1, 2, length=2)
hybrid.size <- 3
nbCluster <- c(3,4)
criterion <- "BIC"
models <- mixmodGaussianModel(family = "spherical", equal.proportions = TRUE)
regModel <- c("LI","LB","LC")
indepModel <- c("LI","LB")
simulate.cl <- SelvarClustLasso(data.cor, nbCluster, lambda, rho, hybrid.size,
criterion, models, regModel, indepModel)
## End(Not run)
``` |

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker. Vote for new features on Trello.