This function gathers a set of the most competitive models using a backwardstepwise strategy. The visited models are gathered in a file with suffix "_ExploredModels.txt". The algorithm used is described in Wilson Toussile and Elisabeth Gassiat (2009).
x 
A matrix of string that contains data. 
Kmax 
The maximum number of clusters to be explored. 
Criterion 
The model selection criterion in c("BIC", "AIC", "ICL", "CteDim") used for exploration (see details). 
ploidy 
The number of columns for each variable in the data. For example, ploidy = 2 for genotypic data from diploid individual. 
ForceExclusion 
The indication of whether to force exclusion or not. The default value is set to FALSE. 
emOptions 
A list of EM options (see 
Kmin 
The minimum number of clusters. The default value is set to 1. 
Smin 
A logical vector that indicates the variables to include in the selected set of clustering variables. The default value NULL: no variable is preselected. 
project 
The name of the project. The default value is the name of the dataset. 
If the penalized criteria is CteDim
, a sequence of penalty functions of the form
pen≤ft(K,S\right)=λ*dim≤ft(K,S\right) is used. In this shape of penalty function,
λ is in [0.5, log(N)], where N is the number of individuals in the sample
data. Thus, AIC
and BIC
penalties are in the sequence of candidate penalties.
A data.frame of selected models for the choosen proposed criteria.
Wilson Toussile
Dominique Bontemps and Wilson Toussile (2013) : Clustering and variable selection for categorical multivariate data. Electronic Journal of Statistics, Volume 7, 23442371, ISSN.
Wilson Toussile and Elisabeth Gassiat (2009) : Variable selection in modelbased clustering using multilocus genotype data. Adv Data Anal Classif, Vol 3, number 2, 109134.
dimJump.R
for the data driven calibration of the penalty function, and
model.selection.R
for the final model selection.
head(genotype1)
genotype2 = cutEachCol(genotype1[, 11], ploidy = 2)
head(genotype2)
# The following command create a file "genotype2_ExploredModels.txt"
# that contains the most competitive models.
#output = backward.explorer(genotype2, Kmax = 10, ploidy = 2, Kmin = 1, Criterion = "CteDim")
data(genotype2_ExploredModels)
head(genotype2_ExploredModels)

