fitmixturegrouped: Estimating parameters of the well-known mixture models fitted...

View source: R/ForestFit.R

fitmixturegroupedR Documentation

Estimating parameters of the well-known mixture models fitted to the grouped data

Description

Estimates parameters of the gamma, log-normal, and Weibull mixture models fitted to the grouped data using the expectation maximization (EM) algorithm. General form for the cdf of a statistical mixture model is given by

F(x,{Θ}) = ∑_{k=1}^{K}ω_k F_k(x,θ_k),

where Θ=(θ_1,…,θ_K)^T, is the whole parameter vector, θ_k for k=1,…,K is the parameter space of the j-th component, i.e. θ_k=(α_k,β_k)^{T}, F_j(.,θ_j) is the cdf of the k-th component, and known constant K is the number of components. Parameters α and β are the shape and scale parameters. The constants ω_ks sum to one, i.e. ∑_{k=1}^{K}ω_k=1. The families considered for the cdf F include Gamma, Log-normal, and Weibull. If a sample of n independent observations each follows a distribution with cdf F have been divided into m separate groups of the form (r_{i-1},r_i], for i=1,…,m. So, the likelihood function of the observed data is given by

L(Θ|f_1,…,f_m)=\frac{n!}{f_{1}!f_{2}!… f_{m}!}∏_{i=1}^{m}\Bigl[\frac{F_i(Θ)}{F(Θ)}\Bigr]^{f_i},

where

F_i(Θ)=∑_{k=1}^{K}ω_k\int_{r_{i-1}}^{r_i}f(x|θ_k)dx,

F(Θ)=∑_{k=1}^{K}ω_kf(x|θ_k)dx,

in which f(x|θ_k) denotes the pdf of the j-th component. Using the the EM algorithm proposed by Dempster et al. (1977), we can solve \partial L(Θ|f_1,…,f_m)/{\partial Θ}=0 by introducing two new missing variables.

Usage

fitmixturegrouped(family, r, f, K, initial=FALSE, starts)

Arguments

family

Name of the family including: "gamma", "log-normal", "skew-normal", and "weibull".

r

A numeric vector of length m+1. The first element of r is lower bound of the first group and other m elements are upper bound of the m groups. We note that upper bound of the (i-1)-th group is the lower bound of the i-th group, for i=2,…,m. The lower bound of the first group and upper bound of the m-th group are chosen arbitrarily. If raw data are available, the smallest and largest observations are chosen for lower bound of the first group and upper bound of the m-th group, respectively.

f

A numeric vector of length m containing the group's frequency.

K

Number of components.

initial

The sequence of initial values including ω_1,…,ω_K,α_1,…,α_K,β_1,…,β_K. For skew normal case the vector of initial values of skewness parameters will be added. By default the initial values automatically is determind by k-means method of clustering.

starts

If initial=TRUE, then sequence of the initial values must be given.

Details

Identifiability of the mixture models supposed to be held. For skew-normal mixture model the parameter vector of k-th component gets the form θ_k=(α_k,β_k,λ_k)^{T} where α_k,β_k, and λ_k denote the location, scale, and skewness parameters, respectively.

Value

  1. The output has two parts, The first part includes vector of estimated weight, shape, and scale parameters.

  2. A sequence of goodness-of-fit measures consist of Akaike Information Criterion (AIC), Consistent Akaike Information Criterion (CAIC), Bayesian Information Criterion (BIC), Hannan-Quinn information criterion (HQIC), Anderson-Darling (AD), Cram\'eer-von Misses (CVM), Kolmogorov-Smirnov (KS), and log-likelihood (log-likelihood) statistics.

Author(s)

Mahdi Teimouri

References

G. J. McLachlan and P. N. Jones, 1988. Fitting mixture models to grouped and truncated data via the EM algorithm, Biometrics, 44, 571-578

Examples

n<-50
K<-2
m<-10
weight<-c(0.3,0.7)
alpha<-c(1,2)
beta<-c(2,1)
param<-c(weight,alpha,beta)
data<-rmixture(n, "weibull", K, param)
r<-seq(min(data),max(data),length=m+1)
D<-data.frame(table(cut(data,r,labels=NULL,include.lowest=TRUE,right=FALSE,dig.lab=4)))
f<-D$Freq
fitmixturegrouped("weibull",r,f,K,initial=FALSE)

ForestFit documentation built on March 7, 2023, 8:27 p.m.