selectK.R: Selection of the number K of clusters.
In ClustMMDD: Variable Selection in Clustering by Mixture Models for Discrete Data

Description Usage Arguments Value Author(s) References See Also Examples

Perform a selection of the number K of clusters for a given subset S of clustering variables.

selectK.R(xdata, S, Kmax, ploidy = 1, Kmin = 1,
  emOptions = list(epsi = 1e-05, nberSmallEM = 20, nberIterations = 15,
  nberMaxIterations = 5000, typeSmallEM = 0, typeEM = 0, putThreshold = FALSE),
  cte = 1, project = deparse(substitute(xdata)))

`xdata`	A dataset in which data of each variable are in ploidy column(s).
`S`	A subset of clustering variables on the form of logical vector of the same length P as the number of variables in `xdata`.
`Kmax`	The maximum number of clusters to be explored.
`ploidy`	The number of occurrences for each variable in the data. For example, ploidy = 2 for genotype
`Kmin`	The minimum number of clusters to be explored. The default value is set to 1.
`emOptions`	A list of EM options (see `EmOptions` and `setEmOptions`).
`cte`	A double used for the selection criterion named `CteDim` in which the penalty function is pen(K,S)=ctedim*, where `dim` is the number of free parameters.
`project`	The name of the project. The default value is the name of the dataset.

A list of estimated paramaters for each selection criteria.

Wilson Toussile

Dominique Bontemps and Wilson Toussile (2013) : Clustering and variable selection for categorical multivariate data. Electronic Journal of Statistics, Volume 7, 2344-2371, ISSN.
Wilson Toussile and Elisabeth Gassiat (2009) : Variable selection in model-based clustering using multilocus genotype data. Adv Data Anal Classif, Vol 3, number 2, 109-134.

backward.explorer for more exploration of the competing models space, dimJump.R for data driven calibration of the penality function, and model.selection.R for model selection.

data(genotype1)
head(genotype1)
genotype2 = cutEachCol(genotype1[, -11], ploidy = 2)
head(genotype2)
S = c(rep(TRUE, 8), rep(FALSE, 2))
## Not run: 
outPut = selectK.R(genotype2, S, Kmax = 6, ploidy = 2, Kmin=1)
outPut[["BIC"]]

file.remove("genotype2_ExploredModels.txt")

## End(Not run)