fitmixture: Estimating parameters of the well-known mixture models

View source: R/ForestFit.R

fitmixtureR Documentation

Estimating parameters of the well-known mixture models

Description

Estimates parameters of the mixture model using the expectation maximization (EM) algorithm. General form for the cdf of a statistical mixture model is given by

F(x,{Θ}) = ∑_{j=1}^{K}ω_j F_j(x,θ_j),

where Θ=(θ_1,…,θ_K)^T, is the whole parameter vector, θ_j for j=1,…,K is the parameter space of the j-th component, i.e. θ_j=(α_j,β_j)^{T}, F_j(.,θ_j) is the cdf of the j-th component, and known constant K is the number of components. Parameters α and β are the shape and scale parameters or both are the shape parameters. In the latter case, the parameters α and β are called the first and second shape parameters, respectively. We note that the constants ω_js sum to one, i.e. ∑_{j=1}^{K}ω_j=1. The families considered for the cdf F include Birnbaum-Saunders, Burr type XII, Chen, F, Frechet, Gamma, Gompertz, Log-normal, Log-logistic, Lomax, skew-normal, and Weibull.

Usage

fitmixture(data, family, K, initial=FALSE, starts)

Arguments

data

Vector of observations.

family

Name of the family including: "birnbaum-saunders", "burrxii", "chen", "f", "Frechet", "gamma", "gompetrz", "log-normal", "log-logistic", "lomax", "skew-normal", and "weibull".

K

Number of components.

initial

The sequence of initial values including ω_1,…,ω_K,α_1,…,α_K,β_1,…,β_K. For skew normal case the vector of initial values of skewness parameters will be added. By default the initial values automatically is determind by k-means method of clustering.

starts

If initial=TRUE, then sequence of the initial values must be given.

Details

It is worth noting that identifiability of the mixture models supposed to be held. For skew-normal case we have θ_j=(α_j,β_j,λ_j)^{T} in which -∞<α_j<∞, β_j>0, and -∞<λ_j<∞, respectively, are the location, scale, and skewness parameters of the j-th component, see Azzalini (1985).

Value

  1. The output has three parts, The first part includes vector of estimated weight, shape, and scale parameters.

  2. The second part involves a sequence of goodness-of-fit measures consist of Akaike Information Criterion (AIC), Consistent Akaike Information Criterion (CAIC), Bayesian Information Criterion (BIC), Hannan-Quinn information criterion (HQIC), Anderson-Darling (AD), Cram\'eer-von Misses (CVM), Kolmogorov-Smirnov (KS), and log-likelihood (log-likelihood) statistics.

  3. The last part of the output contains clustering vector.

Author(s)

Mahdi Teimouri

References

A. Azzalini, 1985. A class of distributions which includes the normal ones, Scandinavian Journal of Statistics, 12, 171-178.

A. P. Dempster, N. M. Laird, and D. B. Rubin, 1977. Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society Series B, 39, 1-38.

M. Teimouri, S. Rezakhah, and A. Mohammdpour, 2018. EM algorithm for symmetric stable mixture model, Communications in Statistics-Simulation and Computation, 47(2), 582-604.

Examples

# Here we model the northern hardwood uneven-age forest data (HW$DIA) in inches using a
# 3-component Weibull mixture distribution.
data(HW)
data<-HW$DIA
K<-3
fitmixture(data,"weibull", K, initial=FALSE)

ForestFit documentation built on March 7, 2023, 8:27 p.m.