get_nbCluster_range: Control of number of components in Gaussian mixture modelling
In Infusion: Inference Using Simulation

get_nbCluster_range

R Documentation

Control of number of components in Gaussian mixture modelling

Description

These functions implement the default values for the number of components tried in Gaussian mixture modelling (matching the nbCluster argument of Rmixmod::mixmodCluster()). get_nbCluster_range allows the user to reproduce the internal rules used by Infusion to determine this argument. seq_nbCluster is a wrapper to the function defined by the seq_nbCluster global option of the package. Its default result is a sequence of integers determined by the number of rows of the data (see Infusion.options). get_nbCluster_range() further checks the feasibility of the values generated by seq_nbCluster()), using additional criteria involving the number of columns of the data to determine the maximum feasible number of clusters. This maximum is controlled by the function defined by the maxnbCluster global option of the package.

refine_nbCluster controls the default number of clusters of refine: it gets the range from seq_nbCluster and keeps only the maximum value of this range if this maximum is higher than the onlymax argument.

Adventurous users can change the rules used by Infusion by changing the global options seq_nbCluster and maxnbCluster (while conforming to the interfaces of these functions). Less ambitiously, they can for example use the maximum value of the result of get_nbCluster_range() as a single reasonable value for the nbCluster argument of infer_SLik_joint.

Usage

seq_nbCluster(nr)
refine_nbCluster(nr, onlymax=7)
get_nbCluster_range(projdata, nr = nrow(projdata), nc = ncol(projdata), 
                    nbCluster = seq_nbCluster(nr), verbose=TRUE)

Arguments

`projdata`	data frame: the data to be clustered, which typically include parameters and projected summary statistics;
`nr`	integer: number of rows of the data to be clustered;
`onlymax`	integer: see Description;
`nc`	integer: number of columns of the data to be clustered, typically twice the number of estimated parameters;
`nbCluster`	integer or vector of integers: candidate values, which feasability is checked by the function.
`verbose`	boolean. Whether to print some information, or not.

Value

An integer vector

Examples

# Determination of number of clusters when attempting to estimate 
#   20 parameters from a reference table with 30000 rows:
seq_nbCluster(nr=30000L)
get_nbCluster_range(nr=30000L, nc=40L) # nc = *twice* the number of parameters

Infusion documentation built on Sept. 30, 2024, 9:16 a.m.