fit.FMMSNC: Fitting Finite Mixture of Multivariate Distributions. In CensMFM: Finite Mixture of Multivariate Censored/Missing Data

Description

It adjusts a finite mixture of censored and/or missing multivariate distributions (FM-MC). These are the Skew-normal, normal and Student-t multivariate distributions. It uses a EM-type algorithm for iteratively computing maximum likelihood estimates of the parameters.

Usage

 ```1 2 3``` ```fit.FMMSNC(cc, LI, LS, y, mu = NULL, Sigma = NULL, shape = NULL, pii = NULL, nu = NULL, g = NULL, get.init = TRUE, criteria = TRUE, family = "SN", error = 1e-05, iter.max = 350, uni.Gama = FALSE, kmeans.param = NULL, cal.im = FALSE) ```

Arguments

 `cc` vector of censoring indicators. For each observation it takes 0 if non-censored, 1 if censored. `LI` the matrix of lower limits of dimension nxp. See details section. `LS` the matrix of upper limits of dimension nxp. See details section. `y` the response matrix with dimension nxp. `mu` a list with g entries, where each entry represents location parameter per group, being a vector of dimension. p. `Sigma` a list with g entries, where each entry represents a scale parameter per group, a matrix with dimension. pxp. `shape` a list with g entries, where each entry represents a skewness parameter, being a vector of dimension p. `pii` a vector of weights for the mixture (dimension of the number g of clusters). Must sum to one! `nu` the degrees of freedom for the Student-t distribution case, being a vector with dimension g. `g` number of mixture components. `get.init` Logical, `TRUE` or `FALSE`. If (`get.init==TRUE`) the function computes the initial values, otherwise (`get.init==FALSE`) the user should enter the initial values manually. `criteria` Logical, `TRUE` or `FALSE`. It indicates if likelihood-based criteria selection methods (AIC, BIC and EDC) are computed for comparison purposes. `family` distribution family to be used. Available distributions are the Skew-normal ("SN"), normal ("Normal") or Student-t ("t") distribution. `error` relative error for stopping criterion of the algorithm. See details. `iter.max` the maximum number of iterations of the EM algorithm. `uni.Gama` Logical, `TRUE` or `FALSE`. If `uni.Gama==TRUE`, the scale matrices per group are considered to be equals. `kmeans.param` a list with alternative parameters for the kmeans function when generating initial values. List by default is `list(iter.max = 10, n.start = 1, algorithm = "Hartigan-Wong")`. `cal.im` Logical, `TRUE` or `FALSE`. If `cal.im==TRUE`, the information matrix is calculated and the standard errors are reported.

Details

The information matrix is calculated with respect to the entries of the square root matrix of Sigma, this using the Empirical information matrix. Disclaimer: User must be careful since the inference is asymptotic, so it must be used for decent sample sizes. Stopping criterion is `abs((loglik/loglik-1))<epsilon`.

Value

It returns a list that depending of the case, it returns one or more of the following objects:

 `mu` a list with g components, where each component is a vector with dimension p containing the estimated values of the location parameter. `Sigma` a list with g components, where each component is a matrix with dimension pxp containing the estimated values of the scale matrix. `Gamma` a list with g components, where each component is a matrix with dimension pxp containing the estimated values of the Gamma scale matrix. `shape` a list with g components, where each component is a vector with dimension p containing the estimated values of the skewness parameter. `nu` a vector with one element containing the value of the degreees of freedom nu parameter. `pii` a vector with g elements containing the estimated values of the weights pii. `Zij` a n x p matrix containing the estimated weights values of the subjects for each group. `yest` a n x p matrix containing the estimated values of y. `MI` a list with the standard errors for all parameters. `logLik` the log-likelihood value for the estimated parameters. `aic` the AIC criterion value for the estimated parameters. `bic` the BIC criterion value for the estimated parameters. `edc` the EDC criterion value for the estimated parameters. `iter` number of iterations until the EM algorithm converges. `group` a n x p matrix containing the classification for the subjects to each group. `time` time in minutes until the EM algorithm converges.

Note

The `uni.Gama` parameter refers to the Γ matrix for the Skew-normal distribution, while for the normal and student-t distribution, this parameter refers to the Σ matrix.

Author(s)

Francisco H. C. de Alencar hildemardealencar@gmail.com, Christian E. Galarza cgalarza88@gmail.com, Victor Hugo Lachos hlachos@uconn.edu and Larissa A. Matos larissam@ime.unicamp.br

Maintainer: Francisco H. C. de Alencar hildemardealencar@gmail.com

References

Cabral, C. R. B., Lachos, V. H., & Prates, M. O. (2012). Multivariate mixture modeling using skew-normal independent distributions. Computational Statistics & Data Analysis, 56(1), 126-142.

Prates, M. O., Lachos, V. H., & Cabral, C. (2013). mixsmsn: Fitting finite mixture of scale mixture of skew-normal distributions. Journal of Statistical Software, 54(12), 1-20.

C.E. Galarza, L.A. Matos, D.K. Dey & V.H. Lachos. (2019) On Moments of Folded and Truncated Multivariate Extended Skew-Normal Distributions. Technical report. ID 19-14. University of Connecticut.

F.H.C. de Alencar, C.E. Galarza, L.A. Matos & V.H. Lachos. (2019) Finite Mixture Modeling of Censored and Missing Data Using the Multivariate Skew-Normal Distribution. echnical report. ID 19-31. University of Connecticut.

`rMSN`, `rMMSN` and `rMMSN.contour`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60``` ```mu <- Sigma <- shape <- list() mu[[1]] <- c(-3,-4) mu[[2]] <- c(2,2) Sigma[[1]] <- matrix(c(3,1,1,4.5), 2,2) Sigma[[2]] <- matrix(c(2,1,1,3.5), 2,2) shape[[1]] <- c(-2,2) shape[[2]] <- c(-3,4) nu <- c(0,0) pii <- c(0.6,0.4) percen <- c(0.1,0.2) n <- 200 g <- 2 seed <- 654678 set.seed(seed) test = rMMSN(n = n, pii = pii,mu = mu,Sigma = Sigma,shape = shape, percen = percen, each = TRUE, family = "SN") Zij <- test\$G cc <- test\$cc y <- test\$y ## left censoring ## LI <-cc LS <-cc LI[cc==1]<- -Inf LS[cc==1]<- y[cc==1] #full analysis may take a few seconds more... test_fit.cc0 = fit.FMMSNC(cc, LI, LS, y, mu=mu, Sigma = Sigma, shape=shape, pii = pii, g = 2, get.init = FALSE, criteria = TRUE, family = "Normal", error = 0.0001, iter.max = 200, uni.Gama = FALSE, cal.im = FALSE) test_fit.cc = fit.FMMSNC(cc, LI, LS, y, mu=mu, Sigma = Sigma, shape=shape, pii = pii, g = 2, get.init = FALSE, criteria = TRUE, family = "SN", error = 0.00001, iter.max = 350, uni.Gama = FALSE, cal.im = TRUE) ## missing data ## pctmiss <- 0.2 # 20% of missing data in the whole data missing <- matrix(runif(n*g), nrow = n) < pctmiss y[missing] <- NA cc <- matrix(nrow = n,ncol = g) cc[missing] <- 1 cc[!missing] <- 0 LI <- cc LS <-cc LI[cc==1]<- -Inf LS[cc==1]<- +Inf test_fit.mis = fit.FMMSNC(cc, LI, LS, y, mu=mu, Sigma = Sigma, shape=shape, pii = pii, g = 2, get.init = FALSE, criteria = TRUE, family = "SN", error = 0.00001, iter.max = 350, uni.Gama = FALSE, cal.im = TRUE) ```