REBMIX: REBMIX Algorithm for Univariate or Multivariate Finite... In rebmix: Finite Mixture Modeling, Clustering & Classification

 REBMIX-methods R Documentation

REBMIX Algorithm for Univariate or Multivariate Finite Mixture Estimation

Description

Returns as default the REBMIX algorithm output for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities. If model equals "REBMVNORM" output for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices is returned.

Usage

## S4 method for signature 'REBMIX'
REBMIX(model = "REBMIX", Dataset = list(), Preprocessing = character(),
cmax = 15, cmin = 1, Criterion = "AIC", pdf = character(),
theta1 = numeric(), theta2 = numeric(), theta3 = numeric(), K = "auto",
ymin = numeric(), ymax = numeric(), ar = 0.1,
Restraints = "loose", EMcontrol = NULL, ...)
## ... and for other signatures
## S4 method for signature 'REBMIX'
summary(object, ...)
## ... and for other signatures


Arguments

 model see Methods section below. Dataset a list of length n_{\mathrm{D}} of data frames or objects of class Histogram. Data frames should have size n \times d containing d-dimensional datasets. Each of the d columns represents one random variable. Numbers of observations n equal the number of rows in the datasets. Preprocessing a character giving the preprocessing type. One of "histogram", "kernel density estimation" or "k-nearest neighbour". cmax maximum number of components c_{\mathrm{max}} > 0. The default value is 15. cmin minimum number of components c_{\mathrm{min}} > 0. The default value is 1. Criterion a character giving the information criterion type. One of default Akaike "AIC", "AIC3", "AIC4" or "AICc", Bayesian "BIC", consistent Akaike "CAIC", Hannan-Quinn "HQC", minimum description length "MDL2" or "MDL5", approximate weight of evidence "AWE", classification likelihood "CLC", integrated classification likelihood "ICL" or "ICL-BIC", partition coefficient "PC", total of positive relative deviations "D" or sum of squares error "SSE". pdf a character vector of length d containing continuous or discrete parametric family types. One of "normal", "lognormal", "Weibull", "gamma", "Gumbel", "binomial", "Poisson", "Dirac", "uniform" or "vonMises". theta1 a vector of length d containing initial component parameters. One of n_{il} = \textrm{number of categories} - 1 for "binomial" distribution. theta2 a vector of length d containing initial component parameters. Currently not used. theta3 a vector of length d containing initial component parameters. One of ξ_{il} \in \{-1, \textrm{NA}, 1\} for "Gumbel" distribution. K a character or a vector or a matrix of size n_{\mathrm{D}} \times d containing numbers of bins v or v_{1}, …, v_{d} for the histogram and the kernel density estimation or numbers of nearest neighbours k for the k-nearest neighbour. There is no genuine rule to identify v or k. Consequently, the REBMIX algorithm identifies them from the set K of input values by minimizing the information criterion. The Sturges rule v = 1 + \mathrm{log_{2}}(n), \mathrm{Log}_{10} rule v = 10 \mathrm{log_{10}}(n) or RootN rule v = 2 √{n} can be applied to estimate the limiting numbers of bins or the rule of thumb k = √{n} to guess the intermediate number of nearest neighbours. If, e.g., K = c(10, 20, 40, 60) and minimum IC coincides, e.g., 40, brackets are set to 20 and 60 and the golden section is applied to refine the minimum search. If, e.g., K = matrix(c(10, 15, 18, 5, 7, 9), byrow = TRUE, ncol = 3) than d = 3 and the list Dataset contains n_{\mathrm{D}} = 2 frames. Hence, different numbers of bins can be assigned to y_{1}, …, y_{d}. See also kseq for sequence of bins or nearest neighbours generation. The default value is "auto". ymin a vector of length d containing minimum observations. The default value is numeric(). ymax a vector of length d containing maximum observations. The default value is numeric(). ar acceleration rate 0 < a_{\mathrm{r}} ≤q 1. The default value is 0.1 and in most cases does not have to be altered. Restraints a character giving the restraints type. One of "rigid" or default "loose". The rigid restraints are obsolete and applicable for well separated components only. EMcontrol an object of class EM.Control. object see Methods section below. ... currently not used.

Value

Returns an object of class REBMIX or REBMVNORM.

Methods

signature(model = "REBMIX")

a character giving the default class name "REBMIX" for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities.

signature(model = "REBMVNORM")

a character giving the class name "REBMVNORM" for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices.

signature(object = "REBMIX")

an object of class REBMIX.

signature(object = "REBMVNORM")

an object of class REBMVNORM.

Marko Nagode

References

H. A. Sturges. The choice of a class interval. Journal of American Statistical Association, 21(153): 65-66, 1926. https://www.jstor.org/stable/2965501.

P. F. Velleman. Interactive computing for exploratory data analysis I: display algorithms. Proceedings of the Statistical Computing Section, American Statistical Association, 1976.

W. J. Dixon and R. A. Kronmal. The Choice of origin and scale for graphs. Journal of the ACM, 12(2): 259-261, 1965. doi: 10.1145/321264.321277.

M. Nagode and M. Fajdiga. A general multi-modal probability density function suitable for the rainflow ranges of stationary random processes. International Journal of Fatigue, 20(3):211-223, 1998. doi: 10.1016/S0142-1123(97)00106-0.

M. Nagode and M. Fajdiga. An improved algorithm for parameter estimation suitable for mixed weibull distributions. International Journal of Fatigue, 22(1):75-80, 2000. doi: 10.1016/S0142-1123(99)00112-7.

M. Nagode, J. Klemenc and M. Fajdiga. Parametric modelling and scatter prediction of rainflow matrices. International Journal of Fatigue, 23(6):525-532, 2001. doi: 10.1016/S0142-1123(01)00007-X.

M. Nagode and M. Fajdiga. An alternative perspective on the mixture estimation problem. Reliability Engineering & System Safety, 91(4):388-397, 2006. doi: 10.1016/j.ress.2005.02.005.

M. Nagode and M. Fajdiga. The rebmix algorithm for the univariate finite mixture estimation. Communications in Statistics - Theory and Methods, 40(5):876-892, 2011a. doi: 10.1080/03610920903480890.

M. Nagode and M. Fajdiga. The rebmix algorithm for the multivariate finite mixture estimation. Communications in Statistics - Theory and Methods, 40(11):2022-2034, 2011b. doi: 10.1080/03610921003725788.

M. Nagode. Finite mixture modeling via REBMIX. Journal of Algorithms and Optimization, 3(2):14-28, 2015. https://repozitorij.uni-lj.si/Dokument.php?id=127674&lang=eng.

B. Panic, J. Klemenc, M. Nagode. Improved initialization of the EM algorithm for mixture model parameter estimation. Mathematics, 8(3):373, 2020. doi: 10.3390/math8030373.

Examples

# Generate and plot univariate normal dataset.

n <- c(998, 263, 1086, 487)

Theta <- new("RNGMIX.Theta", c = 4, pdf = "normal")

a.theta1(Theta) <- c(688, 265, 30, 934)
a.theta2(Theta) <- c(72, 54, 34, 28)

normal <- RNGMIX(Dataset.name = "complex1",
rseed = -1,
n = n,
Theta = a.Theta(Theta))

normal

a.Dataset(normal, 1)[1:20,]

# Estimate number of components, component weights and component parameters.

normalest <- REBMIX(Dataset = a.Dataset(normal),
Preprocessing = "h",
cmax = 8,
Criterion = "BIC",
pdf = "n")

normalest

BIC(normalest)

logL(normalest)

# Plot finite mixture.

plot(normalest, nrow = 2, what = c("pdf", "marginal cdf"), npts = 1000)

# EM algorithm utilization

data(iris)

Dataset <- list(data.frame(iris[, c(1:4)]))

# Create EM.Control object.

EM <- new("EM.Control",
strategy = "exhaustive",
variant = "EM",
acceleration = "fixed",
tolerance = 1e-4,
acceleration.multiplier = 1.0,
maximum.iterations = 1000)

# Mixture parameter estimation using REBMIX and EM algorithm.

irisest <- REBMIX(model = "REBMVNORM",
Dataset = Dataset,
Preprocessing = "histogram",
cmax = 10,
Criterion = "BIC",
EMcontrol = EM)

irisest

# Print total number of EM iterations used in Ehxaustive strategy from summary.EM slot.

a.summary.EM(irisest, col.name = "total.iterations.nbr", pos = 1)


rebmix documentation built on Aug. 18, 2022, 1:06 a.m.