backward.explorer: Gather a set of the most competitive models.
In ClustMMDD: Variable Selection in Clustering by Mixture Models for Discrete Data

Description Usage Arguments Details Value Author(s) References See Also Examples

This function gathers a set of the most competitive models using a backward-stepwise strategy. The visited models are gathered in a file with suffix "_ExploredModels.txt". The algorithm used is described in Wilson Toussile and Elisabeth Gassiat (2009).

backward.explorer(x, Kmax, Criterion, ploidy = 1,
  ForceExclusion = FALSE, emOptions = list(epsi = NULL, nberSmallEM = NULL,
  nberIterations = NULL, nberMaxIterations = NULL, typeSmallEM = NULL, typeEM =
  NULL, putThreshold = NULL), Kmin = 1, Smin = NULL,
  project = deparse(substitute(x)))

`x`	A matrix of string that contains data.
`Kmax`	The maximum number of clusters to be explored.
`Criterion`	The model selection criterion in c("BIC", "AIC", "ICL", "CteDim") used for exploration (see details).
`ploidy`	The number of columns for each variable in the data. For example, ploidy = 2 for genotypic data from diploid individual.
`ForceExclusion`	The indication of whether to force exclusion or not. The default value is set to FALSE.
`emOptions`	A list of EM options (see `EmOptions` and `setEmOptions`).
`Kmin`	The minimum number of clusters. The default value is set to 1.
`Smin`	A logical vector that indicates the variables to include in the selected set of clustering variables. The default value NULL: no variable is preselected.
`project`	The name of the project. The default value is the name of the dataset.

If the penalized criteria is CteDim, a sequence of penalty functions of the form pen≤ft(K,S\right)=λ*dim≤ft(K,S\right) is used. In this shape of penalty function, λ is in [0.5, log(N)], where N is the number of individuals in the sample data. Thus, AIC and BIC penalties are in the sequence of candidate penalties.

A data.frame of selected models for the choosen proposed criteria.

Wilson Toussile

Dominique Bontemps and Wilson Toussile (2013) : Clustering and variable selection for categorical multivariate data. Electronic Journal of Statistics, Volume 7, 2344-2371, ISSN.
Wilson Toussile and Elisabeth Gassiat (2009) : Variable selection in model-based clustering using multilocus genotype data. Adv Data Anal Classif, Vol 3, number 2, 109-134.

dimJump.R for the data driven calibration of the penalty function, and model.selection.R for the final model selection.

data(genotype1)
head(genotype1) 
genotype2 = cutEachCol(genotype1[, -11], ploidy = 2)
head(genotype2)

# The following command create a file "genotype2_ExploredModels.txt" 
# that contains the most competitive models.

#output = backward.explorer(genotype2, Kmax = 10, ploidy = 2, Kmin = 1, Criterion = "CteDim")

data(genotype2_ExploredModels)
head(genotype2_ExploredModels)