MGHM | R Documentation |
Carries out model-based clustering using a multivariate generalized hyperbolic mixture (MGHM). The function will determine itself if the data set is complete or incomplete and fit the appropriate model accordingly. In the incomplete case, the data set must be at least bivariate, and missing values are assumed to be missing at random (MAR).
MGHM(
X,
G,
model = c("GH", "NIG", "SNIG", "SC", "C", "St", "t", "N", "SGH", "HUM", "H", "SH"),
criterion = c("BIC", "AIC", "KIC", "KICc", "AIC3", "CAIC", "AICc", "ICL", "AWE", "CLC"),
max_iter = 20,
epsilon = 0.01,
init_method = c("kmedoids", "kmeans", "hierarchical", "mclust", "manual"),
clusters = NULL,
outlier_cutoff = 0.95,
deriv_ctrl = list(eps = 1e-08, d = 1e-04, zero.tol = sqrt(.Machine$double.eps/7e-07), r
= 6, v = 2, show.details = FALSE),
progress = TRUE
)
X |
An |
G |
An integer vector specifying the numbers of clusters, which must be at least 1. |
model |
A string indicating the mixture model to be fitted; "GH" for generalized hyperbolic by default. See the details section for a list of available distributions. |
criterion |
A character string indicating the information criterion for model selection. "BIC" is used by default. See the details section for a list of available information criteria. |
max_iter |
(optional) A numeric value giving the maximum number of iterations each EM algorithm is allowed to use; 20 by default. |
epsilon |
(optional) A number specifying the epsilon value for the Aitken-based stopping criterion used in the EM algorithm: 0.01 by default. |
init_method |
(optional) A string specifying the method to initialize
the EM algorithm. "kmedoids" clustering is used by default. Alternative
methods include "kmeans", "hierarchical", "mclust", and "manual". When "manual" is chosen,
a vector |
clusters |
(optional) A vector of length |
outlier_cutoff |
(optional) A number between 0 and 1 indicating the percentile cutoff used for outlier detection. This is only relevant for t mixture. |
deriv_ctrl |
(optional) A list containing arguments to control the numerical
procedures for calculating the first and second derivatives. Some values are
suggested by default. Refer to functions |
progress |
(optional) A logical value indicating whether the fitting progress should be displayed; TRUE by default. |
Beside the generalized hyperbolic distribution, the function can fit mixture via its special and limiting cases. Available distributions include
GH - Generalized Hyperbolic
NIG - Normal-Inverse Gaussian
SNIG - Symmetric Normal-Inverse Gaussian
SC - Skew-Cauchy
C - Cauchy
St - Skew-t
t - Student's t
N - Normal or Gaussian
SGH - Symmetric Generalized Hyperbolic
HUM- Hyperbolic Univariate Marginals
H - Hyperbolic
SH - Symmetric Hyperbolic
Available information criteria include
AIC - Akaike information criterion
BIC - Bayesian information criterion
KIC - Kullback information criterion
KICc - Corrected Kullback information criterion
AIC3 - Modified AIC
CAIC - Bozdogan's consistent AIC
AICc - Small-sample version of AIC
ICL - Integrated Completed Likelihood criterion
AWE - Approximate weight of evidence
CLC - Classification likelihood criterion
An object of class MixtureMissing
with:
model |
The model used to fit the data set. |
pi |
Mixing proportions. |
mu |
Component location vectors. |
Sigma |
Component dispersion matrices. |
beta |
Component skewness vectors. Only available if |
lambda |
Component index parameters. Only available if |
omega |
Component concentration parameters. Only available if |
df |
Component degrees of freedom. Only available if |
z_tilde |
An |
clusters |
A numeric vector of length |
outliers |
A logical vector of length |
data |
The original data set if it is complete; otherwise, this is the data set with missing values imputed by appropriate expectations. |
complete |
An |
npar |
The breakdown of the number of parameters to estimate. |
max_iter |
Maximum number of iterations allowed in the EM algorithm. |
iter_stop |
The actual number of iterations needed when fitting the data set. |
final_loglik |
The final value of log-likelihood. |
loglik |
All the values of log-likelihood. |
AIC |
Akaike information criterion. |
BIC |
Bayesian information criterion. |
KIC |
Kullback information criterion. |
KICc |
Corrected Kullback information criterion. |
AIC3 |
Modified AIC. |
CAIC |
Bozdogan's consistent AIC. |
AICc |
Small-sample version of AIC. |
ent |
Entropy. |
ICL |
Integrated Completed Likelihood criterion. |
AWE |
Approximate weight of evidence. |
CLC |
Classification likelihood criterion. |
init_method |
The initialization method used in model fitting. |
Browne, R. P. and McNicholas, P. D. (2015). A mixture of generalized hyperbolic distributions.
Canadian Journal of Statistics, 43(2):176–198.
Wei, Y., Tang, Y., and McNicholas, P. D. (2019). Mixtures of generalized hyperbolic
distributions and mixtures of skew-t distributions for model-based clustering
with incomplete data. Computational Statistics & Data Analysis, 130:18–41.
data('bankruptcy')
#++++ With no missing values ++++#
X <- bankruptcy[, 2:3]
mod <- MGHM(X, G = 2, init_method = 'kmedoids', max_iter = 10)
summary(mod)
plot(mod)
#++++ With missing values ++++#
set.seed(1234)
X <- hide_values(bankruptcy[, 2:3], prop_cases = 0.1)
mod <- MGHM(X, G = 2, init_method = 'kmedoids', max_iter = 10)
summary(mod)
plot(mod)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.