backward.explorer: Gather a set of the most competitive models. In ClustMMDD: Variable Selection in Clustering by Mixture Models for Discrete Data

Description

This function gathers a set of the most competitive models using a backward-stepwise strategy. The visited models are gathered in a file with suffix "_ExploredModels.txt". The algorithm used is described in Wilson Toussile and Elisabeth Gassiat (2009).

Usage

 1 2 3 4 5 backward.explorer(x, Kmax, Criterion, ploidy = 1, ForceExclusion = FALSE, emOptions = list(epsi = NULL, nberSmallEM = NULL, nberIterations = NULL, nberMaxIterations = NULL, typeSmallEM = NULL, typeEM = NULL, putThreshold = NULL), Kmin = 1, Smin = NULL, project = deparse(substitute(x))) 

Arguments

 x A matrix of string that contains data. Kmax The maximum number of clusters to be explored. Criterion The model selection criterion in c("BIC", "AIC", "ICL", "CteDim") used for exploration (see details). ploidy The number of columns for each variable in the data. For example, ploidy = 2 for genotypic data from diploid individual. ForceExclusion The indication of whether to force exclusion or not. The default value is set to FALSE. emOptions A list of EM options (see EmOptions and setEmOptions). Kmin The minimum number of clusters. The default value is set to 1. Smin A logical vector that indicates the variables to include in the selected set of clustering variables. The default value NULL: no variable is preselected. project The name of the project. The default value is the name of the dataset.

Details

If the penalized criteria is CteDim, a sequence of penalty functions of the form pen≤ft(K,S\right)=λ*dim≤ft(K,S\right) is used. In this shape of penalty function, λ is in [0.5, log(N)], where N is the number of individuals in the sample data. Thus, AIC and BIC penalties are in the sequence of candidate penalties.

Value

A data.frame of selected models for the choosen proposed criteria.

Wilson Toussile

References

dimJump.R for the data driven calibration of the penalty function, and model.selection.R for the final model selection.
  1 2 3 4 5 6 7 8 9 10 11 12 data(genotype1) head(genotype1) genotype2 = cutEachCol(genotype1[, -11], ploidy = 2) head(genotype2) # The following command create a file "genotype2_ExploredModels.txt" # that contains the most competitive models. #output = backward.explorer(genotype2, Kmax = 10, ploidy = 2, Kmin = 1, Criterion = "CteDim") data(genotype2_ExploredModels) head(genotype2_ExploredModels)