MCNM: Multivariate Contaminated Normal Mixture (MCNM)
In MixtureMissing: Robust and Flexible Model-Based Clustering for Data Sets with Missing Values at Random

View source: R/MCNM.R

MCNM	R Documentation

Multivariate Contaminated Normal Mixture (MCNM)

Description

Carries out model-based clustering using a multivariate contaminated normal mixture (MCNM). The function will determine itself if the data set is complete or incomplete and fit the appropriate model accordingly. In the incomplete case, the data set must be at least bivariate, and missing values are assumed to be missing at random (MAR).

Usage

MCNM(
  X,
  G,
  criterion = c("BIC", "AIC", "KIC", "KICc", "AIC3", "CAIC", "AICc", "ICL", "AWE", "CLC"),
  max_iter = 20,
  epsilon = 0.01,
  init_method = c("kmedoids", "kmeans", "hierarchical", "mclust", "manual"),
  clusters = NULL,
  eta_min = 1.001,
  progress = TRUE
)

Arguments

`X`	An `n` x `d` matrix or data frame where `n` is the number of observations and `d` is the number of variables.
`G`	An integer vector specifying the numbers of clusters, which must be at least 1.
`criterion`	A character string indicating the information criterion for model selection. "BIC" is used by default. See the details section for a list of available information criteria.
`max_iter`	(optional) A numeric value giving the maximum number of iterations each EM algorithm is allowed to use; 20 by default.
`epsilon`	(optional) A number specifying the epsilon value for the Aitken-based stopping criterion used in the EM algorithm: 0.01 by default.
`init_method`	(optional) A string specifying the method to initialize the EM algorithm. "kmedoids" clustering is used by default. Alternative methods include "kmeans", "hierarchical", "mclust", and "manual". When "manual" is chosen, a vector `clusters` of length `n` must be specified. If the data set is incomplete, missing values will be first filled based on the mean imputation method.
`clusters`	(optional) A numeric vector of length `n` that specifies the initial cluster memberships of the user when `init_method` is set to "manual". This argument is NULL by default, so that it is ignored whenever other given initialization methods are chosen.
`eta_min`	(optional) A numeric value close to 1 to the right specifying the minimum value of eta; 1.001 by default.
`progress`	(optional) A logical value indicating whether the fitting progress should be displayed; TRUE by default.

Details

Available information criteria include

AIC - Akaike information criterion
BIC - Bayesian information criterion
KIC - Kullback information criterion
KICc - Corrected Kullback information criterion
AIC3 - Modified AIC
CAIC - Bozdogan's consistent AIC
AICc - Small-sample version of AIC
ICL - Integrated Completed Likelihood criterion
AWE - Approximate weight of evidence
CLC - Classification likelihood criterion

Value

An object of class MixtureMissing with:

`model`	The model used to fit the data set.
`pi`	Mixing proportions.
`mu`	Component location vectors.
`Sigma`	Component dispersion matrices.
`alpha`	Component proportions of good observations.
`eta`	Component degrees of contamination.
`z_tilde`	An `n` by `G` matrix where each row indicates the expected probabilities that the corresponding observation belongs to each cluster.
`v_tilde`	An `n` by `G` matrix where each row indicates the expected probabilities that the corresponding observation is good with respect to each cluster.
`clusters`	A numeric vector of length `n` indicating cluster memberships determined by the model.
`outliers`	A logical vector of length `n` indicating observations that are outliers.
`data`	The original data set if it is complete; otherwise, this is the data set with missing values imputed by appropriate expectations.
`complete`	An `n` by `d` logical matrix indicating which cells have no missing values.
`npar`	The breakdown of the number of parameters to estimate.
`max_iter`	Maximum number of iterations allowed in the EM algorithm.
`iter_stop`	The actual number of iterations needed when fitting the data set.
`final_loglik`	The final value of log-likelihood.
`loglik`	All the values of log-likelihood.
`AIC`	Akaike information criterion.
`BIC`	Bayesian information criterion.
`KIC`	Kullback information criterion.
`KICc`	Corrected Kullback information criterion.
`AIC3`	Modified AIC.
`CAIC`	Bozdogan's consistent AIC.
`AICc`	Small-sample version of AIC.
`ent`	Entropy.
`ICL`	Integrated Completed Likelihood criterion.
`AWE`	Approximate weight of evidence.
`CLC`	Classification likelihood criterion.
`init_method`	The initialization method used in model fitting.

References

Punzo, A. and McNicholas, P.D., 2016. Parsimonious mixtures of multivariate contaminated normal distributions. Biometrical Journal, 58(6), pp.1506-1537.

Tong, H. and, Tortora, C., 2022. Model-based clustering and outlier detection with missing data. Advances in Data Analysis and Classification.

Examples


data('auto')

#++++ With no missing values ++++#

X <- auto[, c('engine_size', 'city_mpg', 'highway_mpg')]
mod <- MCNM(X, G = 2, init_method = 'kmedoids', max_iter = 10)

summary(mod)
plot(mod)

#++++ With missing values ++++#

X <- auto[, c('normalized_losses', 'horsepower', 'highway_mpg', 'price')]
mod <- MCNM(X, G = 2, init_method = 'kmedoids', max_iter = 10)

summary(mod)
plot(mod)

MixtureMissing documentation built on April 4, 2025, 3:38 a.m.

MixtureMissing index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

MixtureMissing
Robust and Flexible Model-Based Clustering for Data Sets with Missing Values at Random

MCNM: Multivariate Contaminated Normal Mixture (MCNM)
In MixtureMissing: Robust and Flexible Model-Based Clustering for Data Sets with Missing Values at Random

Multivariate Contaminated Normal Mixture (MCNM)

Description

Usage

Arguments

Details

Value

References

Examples

Related to MCNM in MixtureMissing...

R Package Documentation

Browse R Packages

We want your feedback!

MixtureMissing Robust and Flexible Model-Based Clustering for Data Sets with Missing Values at Random

MCNM: Multivariate Contaminated Normal Mixture (MCNM) In MixtureMissing: Robust and Flexible Model-Based Clustering for Data Sets with Missing Values at Random

Multivariate Contaminated Normal Mixture (MCNM)

Description

Usage

Arguments

Details

Value

References

Examples

Related to MCNM in MixtureMissing...

R Package Documentation

Browse R Packages

We want your feedback!

MixtureMissing
Robust and Flexible Model-Based Clustering for Data Sets with Missing Values at Random

MCNM: Multivariate Contaminated Normal Mixture (MCNM)
In MixtureMissing: Robust and Flexible Model-Based Clustering for Data Sets with Missing Values at Random