fitmixturegrouped | R Documentation |
Estimates parameters of the gamma, log-normal, and Weibull mixture models fitted to the grouped data using the expectation maximization (EM) algorithm. General form for the cdf of a statistical mixture model is given by
F(x,{Θ}) = ∑_{k=1}^{K}ω_k F_k(x,θ_k),
where Θ=(θ_1,…,θ_K)^T, is the whole parameter vector, θ_k for k=1,…,K is the parameter space of the j-th component, i.e. θ_k=(α_k,β_k)^{T}, F_j(.,θ_j) is the cdf of the k-th component, and known constant K is the number of components. Parameters α and β are the shape and scale parameters. The constants ω_ks sum to one, i.e. ∑_{k=1}^{K}ω_k=1. The families considered for the cdf F include Gamma, Log-normal, and Weibull. If a sample of n independent observations each follows a distribution with cdf F have been divided into m separate groups of the form (r_{i-1},r_i], for i=1,…,m. So, the likelihood function of the observed data is given by
L(Θ|f_1,…,f_m)=\frac{n!}{f_{1}!f_{2}!… f_{m}!}∏_{i=1}^{m}\Bigl[\frac{F_i(Θ)}{F(Θ)}\Bigr]^{f_i},
where
F_i(Θ)=∑_{k=1}^{K}ω_k\int_{r_{i-1}}^{r_i}f(x|θ_k)dx,
F(Θ)=∑_{k=1}^{K}ω_kf(x|θ_k)dx,
in which f(x|θ_k) denotes the pdf of the j-th component. Using the the EM algorithm proposed by Dempster et al. (1977), we can solve \partial L(Θ|f_1,…,f_m)/{\partial Θ}=0 by introducing two new missing variables.
fitmixturegrouped(family, r, f, K, initial=FALSE, starts)
family |
Name of the family including: " |
r |
A numeric vector of length m+1. The first element of r is lower bound of the first group and other m elements are upper bound of the m groups. We note that upper bound of the (i-1)-th group is the lower bound of the i-th group, for i=2,…,m. The lower bound of the first group and upper bound of the m-th group are chosen arbitrarily. If raw data are available, the smallest and largest observations are chosen for lower bound of the first group and upper bound of the m-th group, respectively. |
f |
A numeric vector of length m containing the group's frequency. |
K |
Number of components. |
initial |
The sequence of initial values including ω_1,…,ω_K,α_1,…,α_K,β_1,…,β_K. For skew normal case the vector of initial values of skewness parameters will be added. By default the initial values automatically is determind by k-means method of clustering. |
starts |
If |
Identifiability of the mixture models supposed to be held. For skew-normal mixture model the parameter vector of k-th component gets the form θ_k=(α_k,β_k,λ_k)^{T} where α_k,β_k, and λ_k denote the location, scale, and skewness parameters, respectively.
The output has two parts, The first part includes vector of estimated weight, shape, and scale parameters.
A sequence of goodness-of-fit measures consist of Akaike Information Criterion (AIC
), Consistent Akaike Information Criterion (CAIC
), Bayesian Information Criterion (BIC
), Hannan-Quinn information criterion (HQIC
), Anderson-Darling (AD
), Cram\'eer-von Misses (CVM
), Kolmogorov-Smirnov (KS
), and log-likelihood (log-likelihood
) statistics.
Mahdi Teimouri
G. J. McLachlan and P. N. Jones, 1988. Fitting mixture models to grouped and truncated data via the EM algorithm, Biometrics, 44, 571-578
n<-50 K<-2 m<-10 weight<-c(0.3,0.7) alpha<-c(1,2) beta<-c(2,1) param<-c(weight,alpha,beta) data<-rmixture(n, "weibull", K, param) r<-seq(min(data),max(data),length=m+1) D<-data.frame(table(cut(data,r,labels=NULL,include.lowest=TRUE,right=FALSE,dig.lab=4))) f<-D$Freq fitmixturegrouped("weibull",r,f,K,initial=FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.