MoE_stepwise | R Documentation |
Conducts a greedy forward stepwise search to identify the optimal MoEClust
model according to some criterion
. Components and/or gating
covariates and/or expert
covariates are added to new MoE_clust
fits at each step, while each step is evaluated for all valid modelNames
.
MoE_stepwise(data,
network.data = NULL,
gating = NULL,
expert = NULL,
modelNames = NULL,
fullMoE = FALSE,
noise = FALSE,
initialModel = NULL,
initialG = NULL,
stepG = TRUE,
criterion = c("bic", "icl", "aic"),
equalPro = c("all", "both", "yes", "no"),
noise.gate = c("all", "both", "yes", "no"),
verbose = interactive(),
...)
data |
A numeric vector, matrix, or data frame of observations. Categorical variables are not allowed. If a matrix or data frame, rows correspond to observations and columns correspond to variables. |
network.data |
An optional matrix or data frame in which to look for the covariates specified in the |
gating |
A vector giving the names of columns in If |
expert |
A vector giving the names of columns in If |
modelNames |
A character string of valid model names, to be used to restrict the size of the search space, if desired. By default, all valid model types are explored. Rather than considering the changing of the model type as an additional step, every step is evaluated over all entries in Note that if |
fullMoE |
A logical which, when Furthermore, when In addition, caution is advised using this argument in conjunction with |
noise |
A logical indicating whether to assume all models contain an additional noise component ( |
initialModel |
An object of class If However, while |
initialG |
A single (positive) integer giving the number of mixture components (clusters) to initialise the stepwise search algorithm with. This is a simpler alternative to the |
stepG |
A logical indicating whether the algorithm should consider incrementing the number of components at each step. Defaults to |
criterion |
The model selection criterion used to determine the optimal action at each step. Defaults to |
equalPro |
A character string indicating whether models with equal mixing proportions should be considered. The default ( Considering |
noise.gate |
A character string indicating whether models where the gating network for the noise component depends on covariates are considered. The default ( Considering |
verbose |
Logical indicating whether to print messages pertaining to progress to the screen during fitting. By default is |
... |
Additional arguments to |
The arguments modelNames
, equalPro
, and noise.gate
are provided for computational convenience. They can be used to reduce the number of models under consideration at each stage.
The same is true of the arguments gating
and expert
, which can each separately (or jointly, if fullMoE
is TRUE
) be made to consider all variables in network.data
, or a subset, or none at all.
Finally, initialModel
or initialG
can be used to kick-start the search algorithm by incorporating prior information in a more direct way; in the latter case, only in the form of the number of components; in the former case, a full model with a given number of components, certain included gating and expert network covariates, and a certain model type can give the model an even more informed head start. In either case, the stepG
argument can be used to fix the number of components and only search over different configurations of covariates.
Without any prior information, it is best to accept the defaults at the expense of a longer run-time.
An object of class "MoECompare"
containing information on all visited models and the optimal model (accessible via x$optimal
).
It is advised to run this function once with noise=FALSE
and once with noise=TRUE
and then choose the optimal model across both sets of results.
At present, only additions (of components and covariates) are considered. In future updates, it may be possible to allow both additions and removals.
The function will attempt to remove duplicate variables found in both data
and network.data
; in particular, they will be removed from network.data
. Users are however advised to careful specify data
and network.data
such that there are no duplicates, especially if the desired variable(s) should belong to network.data
.
Finally, if the user intends to search for the best model according to the "icl"
criterion
, then specifying either initialModel
or initialG
is advisable. This is because the algorithm otherwise starts with a single component and thus there is no entropy term, meaning the stepwise search can quickly and easily get stuck at G=1
. See the examples below.
Keefe Murphy - <keefe.murphy@mu.ie>
Murphy, K. and Murphy, T. B. (2020). Gaussian parsimonious clustering models with covariates and a noise component. Advances in Data Analysis and Classification, 14(2): 293-325. <\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/s11634-019-00373-8")}>.
MoE_clust
, MoE_compare
, MoE_control
# data(CO2data)
# Search over all models where the single covariate can enter either network
# (mod1 <- MoE_stepwise(CO2data$CO2, CO2data[,"GNP", drop=FALSE]))
#
# data(ais)
# Only look for EVE & EEE models with at most one expert network covariate
# Do not consider any gating covariates and only consider models with equal mixing proportions
# (mod2 <- MoE_stepwise(ais[,3:7], ais, gating=NA, expert="sex",
# equalPro="yes", modelNames=c("EVE", "EEE")))
#
# Look for models with noise & only those where the noise component's mixing proportion is constant
# Speed up the search with an initialModel, fix G, and restrict the covariates & model type
# init <- MoE_clust(ais[,3:7], G=2, modelNames="EEE",
# expert= ~ sex, network.data=ais, tau0=0.1)
# (mod3 <- MoE_stepwise(ais[,3:7], ais, noise=TRUE, expert="sex",
# gating=c("SSF", "Ht"), noise.gate="no",
# initialModel=init, stepG=FALSE, modelNames="EEE"))
#
# Compare both sets of results (with & without a noise component) for the ais data
# (comp1 <- MoE_compare(mod2, mod3, optimal.only=TRUE))
# comp1$optimal
#
# Target a model for the AIS data which is optimal in terms of ICL, without any restrictions
# mod4 <- MoE_stepwise(ais[,3:7], ais, criterion="icl")
#
# This gets stuck at a G=1 model, so specify an initial G value as a head start
# mod5 <- MoE_stepwise(ais[,3:7], ais, criterion="icl", initialG=2)
#
# Check that specifying an initial G value enables a better model to be found
# (comp2 <- MoE_compare(mod4, mod5, optimal.only=TRUE, criterion="icl"))
# Finally, restrict the search to full MoE models only
# Notice that the candidate covariates are the union of gating and expert
# Notice also that the algorithm initially traverses models with only
# expert covariates when the inclusion of gating covariates is infeasible
# mod6 <- MoE_stepwise(ais[,3:7], ais, fullMoE=TRUE, gating="BMI", expert="Bfat")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.