dimJump.R: Data driven calibration of the penalty function
In ClustMMDD: Variable Selection in Clustering by Mixture Models for Discrete Data

Description Usage Arguments Details Value Author(s) References See Also Examples

Data driven calibration of the penalty function using the dimension jump version of the "slope heuristics".

1	dimJump.R(fileOrData, h = integer(), N = integer(), header = logical())

`fileOrData`	A character string or a data frame (see details). If a data frame, it must contain columns named `logLik` and `dim`. If a file, it must be as the one produced by `backward.explorer`.
`h`	An integer defining the size of the sliding window used to find the biggest jump.
`N`	The size of the sample data (number of rows).
`header`	The indication of whether the file contains header or not.

This function is a dimension jump version of the so called slope heuristics for the calibration of penalty function using the data.

Assume that the penalty function is in the form

pen≤ft(K,S\right) = α*λ*dim≤ft(K,S\right)

, where

λ is the penalty parameter to be calibrated,
and α a coeffcient belonging to [1.5,2], to be given by the user in model.selection.R for the final selection.

It returns a list containing two candidate values of λ and their bounds. It also produces a graphic that illustrates the "slope heuristics".

Wilson Toussile

Dominique Bontemps and Wilson Toussile (2013) : Clustering and variable selection for categorical multivariate data. Electronic Journal of Statistics, Volume 7, 2344-2371, ISSN.
Wilson Toussile and Elisabeth Gassiat (2009) : Variable selection in model-based clustering using multilocus genotype data. Adv Data Anal Classif, Vol 3, number 2, 109-134.

backward.explorer for exploration of competing models space, model.selection.R for final selection.

# genotype2_ExploredModels was obtained via backward.explorer.
data(genotype2_ExploredModels)
outDimJump = dimJump.R(genotype2_ExploredModels, N = 1000, h = 5, header = TRUE)
outDimJump[[1]]