View source: R/select_mixture.R
select_mixture | R Documentation |
Fit mixtures via various distributions and decide the best model based on a given information criterion. The distributions include multivariate contaminated normal, multivariate generalized hyperbolic, special and limiting cases of multivariate generalized hyperbolic.
select_mixture(
X,
G,
model = c("CN", "GH", "NIG", "SNIG", "SC", "C", "St", "t", "N", "SGH", "HUM", "H",
"SH"),
criterion = c("BIC", "AIC", "KIC", "KICc", "AIC3", "CAIC", "AICc", "ICL", "AWE", "CLC"),
max_iter = 20,
epsilon = 0.01,
init_method = c("kmedoids", "kmeans", "hierarchical", "manual"),
clusters = NULL,
eta_min = 1.001,
outlier_cutoff = 0.95,
deriv_ctrl = list(eps = 1e-08, d = 1e-04, zero.tol = sqrt(.Machine$double.eps/7e-07), r
= 6, v = 2, show.details = FALSE),
progress = TRUE
)
X |
An |
G |
The number of clusters, which must be at least 1. If |
model |
A vector of character strings indicating the mixture model(s) to be fitted. See the details section for a list of available distributions. However, all distributions will be considered by default. |
criterion |
A character string indicating the information criterion for model selection. "BIC" is used by default. See the details section for a list of available information criteria. |
max_iter |
(optional) A numeric value giving the maximum number of iterations each EM algorithm is allowed to use; 20 by default. |
epsilon |
(optional) A number specifying the epsilon value for the Aitken-based stopping criterion used in the EM algorithm: 0.01 by default. |
init_method |
(optional) A string specifying the method to initialize
the EM algorithm. "kmedoids" clustering is used by default. Alternative
methods include "kmeans", "hierarchical", and "manual". When "manual" is chosen,
a vector |
clusters |
(optional) A vector of length |
eta_min |
(optional) A numeric value close to 1 to the right specifying the minimum value of eta; 1.001 by default. This is only relevant for CN mixture |
outlier_cutoff |
(optional) A number between 0 and 1 indicating the percentile cutoff used for outlier detection. This is only relevant for t mixture. |
deriv_ctrl |
(optional) A list containing arguments to control the numerical
procedures for calculating the first and second derivatives. Some values are
suggested by default. Refer to functions |
progress |
(optional) A logical value indicating whether the fitting progress should be displayed; TRUE by default. |
The function can fit mixtures via the contaminated normal distribution, generalized hyperbolic distribution, and special and limiting cases of the generalized hyperbolic distribution. Available distributions include
CN - Contaminated Normal
GH - Generalized Hyperbolic
NIG - Normal-Inverse Gaussian
SNIG - Symmetric Normal-Inverse Gaussian
SC - Skew-Cauchy
C - Cauchy
St - Skew-t
t - Student's t
N - Normal or Gaussian
SGH - Symmetric Generalized Hyperbolic
HUM- Hyperbolic Univariate Marginals
H - Hyperbolic
SH - Symmetric Hyperbolic
Available information criteria include
AIC - Akaike information criterion
BIC - Bayesian information criterion
KIC - Kullback information criterion
KICc - Corrected Kullback information criterion
AIC3 - Modified AIC
CAIC - Bozdogan's consistent AIC
AICc - Small-sample version of AIC
ICL - Integrated Completed Likelihood criterion
AWE - Approximate weight of evidence
CLC - Classification likelihood criterion
A list with
best_mod |
An object of class |
all_mod |
A list of objects of class |
criterion |
A numeric vector containing the chosen information criterion values of all models of consideration. The vector is in the order of best-to-worst models. |
Each object of class MixtureMissing
have slots depending on the fitted model. See
the returned value of MCNM and MGHM.
Browne, R. P. and McNicholas, P. D. (2015). A mixture of generalized hyperbolic distributions.
Canadian Journal of Statistics, 43(2):176–198.
Wei, Y., Tang, Y., and McNicholas, P. D. (2019). Mixtures of generalized hyperbolic
distributions and mixtures of skew-t distributions for model-based clustering
with incomplete data. Computational Statistics & Data Analysis, 130:18–41.
data('bankruptcy')
#++++ With no missing values ++++#
X <- bankruptcy[, 2:3]
mod <- select_mixture(X, G = 2, model = c('CN', 'GH', 'St'), criterion = 'BIC', max_iter = 10)
#++++ With missing values ++++#
set.seed(1234)
X <- hide_values(bankruptcy[, 2:3], prop_cases = 0.1)
mod <- select_mixture(X, G = 2, model = c('CN', 'GH', 'St'), criterion = 'BIC', max_iter = 10)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.