Description Usage Arguments Details Value Author(s) References See Also Examples
This function gathers a set of the most competitive models using a backward-stepwise strategy. The visited models are gathered in a file with suffix "_ExploredModels.txt". The algorithm used is described in Wilson Toussile and Elisabeth Gassiat (2009).
1 2 3 4 5 |
x |
A matrix of string that contains data. |
Kmax |
The maximum number of clusters to be explored. |
Criterion |
The model selection criterion in c("BIC", "AIC", "ICL", "CteDim") used for exploration (see details). |
ploidy |
The number of columns for each variable in the data. For example, ploidy = 2 for genotypic data from diploid individual. |
ForceExclusion |
The indication of whether to force exclusion or not. The default value is set to FALSE. |
emOptions |
A list of EM options (see |
Kmin |
The minimum number of clusters. The default value is set to 1. |
Smin |
A logical vector that indicates the variables to include in the selected set of clustering variables. The default value NULL: no variable is preselected. |
project |
The name of the project. The default value is the name of the dataset. |
If the penalized criteria is CteDim
, a sequence of penalty functions of the form
pen≤ft(K,S\right)=λ*dim≤ft(K,S\right) is used. In this shape of penalty function,
λ is in [0.5, log(N)], where N is the number of individuals in the sample
data. Thus, AIC
and BIC
penalties are in the sequence of candidate penalties.
A data.frame of selected models for the choosen proposed criteria.
Wilson Toussile
Dominique Bontemps and Wilson Toussile (2013) : Clustering and variable selection for categorical multivariate data. Electronic Journal of Statistics, Volume 7, 2344-2371, ISSN.
Wilson Toussile and Elisabeth Gassiat (2009) : Variable selection in model-based clustering using multilocus genotype data. Adv Data Anal Classif, Vol 3, number 2, 109-134.
dimJump.R
for the data driven calibration of the penalty function, and
model.selection.R
for the final model selection.
1 2 3 4 5 6 7 8 9 10 11 12 | data(genotype1)
head(genotype1)
genotype2 = cutEachCol(genotype1[, -11], ploidy = 2)
head(genotype2)
# The following command create a file "genotype2_ExploredModels.txt"
# that contains the most competitive models.
#output = backward.explorer(genotype2, Kmax = 10, ploidy = 2, Kmin = 1, Criterion = "CteDim")
data(genotype2_ExploredModels)
head(genotype2_ExploredModels)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.