MoE_stepwise: Stepwise model/variable selection for MoEClust models

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Description

Conducts a greedy forward stepwise search to identify the optimal MoEClust model according to some criterion. Components and/or gating covariates and/or expert covariates are added to new MoE_clust fits at each step, while each step is evaluated for all valid modelNames.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
MoE_stepwise(data,
             network.data = NULL,
             gating = NULL,
             expert = NULL,
             modelNames = NULL,
             noise = FALSE,
             criterion = c("bic", "icl", "aic"),
             equalPro = c("both", "yes", "no"),
             noise.gate = c("both", "yes", "no"),
             verbose = interactive(),
             ...)

Arguments

data

A numeric vector, matrix, or data frame of observations. Categorical variables are not allowed. If a matrix or data frame, rows correspond to observations and columns correspond to variables.

network.data

An optional matrix or data frame in which to look for the covariates specified in the gating &/or expert networks, if any. Must include column names. Columns in network.data corresponding to columns in data will be automatically removed. While a single covariate can be supplied as a vector (provided the '$' operator is not used), it is safer to supply a named 1-column matrix or data frame in this instance.

gating

A vector giving the names of columns in network.data used to define the scope of the gating network. The initial model will contain no covariates, thereafter all variables in gating will be considered for inclusion where appropriate.

If gating is not supplied, all variables in network.data will be considered for the gating network. gating can also be supplied as NA, in which case no gating network covariates will ever be considered. Supplying gating and expert can be used to ensure different subsets of covariates enter different parts of the model.

expert

A vector giving the names of columns in network.data used to define the scope of the expert network. The initial model will contain no covariates, thereafter all variables in expert will be considered for inclusion where appropriate.

If expert is not supplied, all variables in network.data will be considered for the expert network. expert can also be supplied as NA, in which case no expert network covariates will ever be considered. Supplying expert and gating can be used to ensure different subsets of covariates enter different parts of the model.

modelNames

A character string or valid model names, to be used to restrict the size of the search space, if desired. By default, all valid model types are explored. Rather than considered the changing of the model type as an additional step, every step is evaluated over all entries in modelNames. See MoE_clust for more details.

noise

A logical indicating whether to assume all models contain an additional noise component (TRUE) or not (FALSE, the default). When TRUE, the search starts from a G=0 noise-only model, otherwise the search starts from a G=1 model with no covariates. See MoE_control for more details.

criterion

The model selection criterion used to determine the optimal action at each step. Defaults to "bic".

equalPro

A character string indicating whether models with equal mixing proportions should be considered. "both" (the default) means models with both equal and unequal mixing proportions will be considered, "yes" means only models with equal mixing proportions will be considered, and "no" means only models with unequal mixing proportions will be considered.

Considering "both" equal and unequal mixing proportion models increases the search space and the computational burden, but this argument becomes irrelevant after a model, if any, with gating network covariate(s) is considered optimal for a given step. See MoE_control for more details.

noise.gate

A character string indicating whether models where the gating network for the noise component depends on covariates are considered. "yes" means only models where this is the case will be considered, "no" means only models for which the noise component's mixing proportion is constant will be considered and "both" (the default) means both of these scenarios will be considered.

Considering "both" increases the search space and the computational burden, but this argument is only relevant when noise=TRUE and gating covariates are being considered. See MoE_control for more details.

verbose

Logical indicating whether to print messages pertaining to progress to the screen during fitting. By default is TRUE if the session is interactive, and FALSE otherwise. If FALSE, warnings and error messages will still be printed to the screen, but everything else will be suppressed.

...

Additional arguments to MoE_control. Note that these arguments will be supplied to all candidate models for every step.

Details

The arguments modelNames, equalPro, and noise.gate are provided for computational convenience. They can be used to reduce the number of models under consideration at each stage.

The same is true of the arguments gating and expert, which can each separately be made to consider all variables in network.data, or a subset, or none at all.

Without any prior information, it is best to accept the defaults at the expense of a longer run-time.

Value

An object of class "MoECompare" containing information on all visited models and the optimal model (accessible via x$optimal).

Note

It is advised to run this function once with noise=FALSE and once with noise=TRUE and then choose the optimal model across both sets of results.

At present, only additions (of components and covariates) are considered. In future updates, it will be possible to allow both additions and removals.

The function will attempt to remove duplicate variables found in both data and network.data; in particular, they will be removed from network.data. Users are however advised to careful specify data and network.data such that there are no duplicates, especially if the desired variable(s) should belong to network.data.

Author(s)

Keefe Murphy - <keefe.murphy@mu.ie>

References

Murphy, K. and Murphy, T. B. (2020). Gaussian parsimonious clustering models with covariates and a noise component. Advances in Data Analysis and Classification, 14(2): 293-325. <doi: 10.1007/s11634-019-00373-8>.

See Also

MoE_clust, MoE_compare, MoE_control

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# data(CO2data)
# Search over all models where the single covariate can enter either network
# (mod1 <- MoE_stepwise(CO2data$CO2, CO2data[,"GNP", drop=FALSE]))
#
# data(ais)
# Only look for EVE & EEE models with at most one expert network covariate
# Do not consider any gating covariates
# (mod2 <- MoE_stepwise(ais[,3:7], ais, gating=NA, expert="sex", modelNames=c("EVE", "EEE")))
#
# Look for models with a noise component, unequal mixing proportions,
# and only consider models with a constant mixing proportion for the noise component
# (mod3 <- MoE_stepwise(ais[,3:7], ais, noise=TRUE,  gating=c("SSF", "Ht"), expert="sex", 
#                       equalPro="no", noise.gate="no", modelNames="EEE"))
#
# Compare both sets of results (with & without a noise component) for the ais data
# (comp <- MoE_compare(mod2, mod3, optimal.only=TRUE))
# comp$optimal

Keefe-Murphy/MoEClust documentation built on Jan. 11, 2021, 6:34 p.m.