# betamix: Finite Mixtures of Beta Regression for Rates and Proportions In betareg: Beta Regression

## Description

Fit finite mixtures of beta regression models for rates and proportions via maximum likelihood with the EM algorithm using a parametrization with mean (depending through a link function on the covariates) and precision parameter (called phi).

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10``` ```betamix(formula, data, k, subset, na.action, weights, offset, link = c("logit", "probit", "cloglog", "cauchit", "log", "loglog"), link.phi = "log", control = betareg.control(...), cluster = NULL, FLXconcomitant = NULL, FLXcontrol = list(), verbose = FALSE, nstart = if (is.null(cluster)) 3 else 1, which = "BIC", ID, fixed, extra_components, ...) extraComponent(type = c("uniform", "betareg"), coef, delta, link = "logit", link.phi = "log") ```

## Arguments

 `formula` symbolic description of the model (of type `y ~ x` or `y ~ x | z`; for details see `betareg`). `data, subset, na.action` arguments controlling formula processing via `model.frame`. `weights` optional numeric vector of integer case weights. `offset` optional numeric vector with an a priori known component to be included in the linear predictor for the mean. `k` a vector of integers indicating the number of components of the finite mixture; passed in turn to the `k` argument of `stepFlexmix`. `link` character specification of the link function in the mean model (mu). Currently, `"logit"`, `"probit"`, `"cloglog"`, `"cauchit"`, `"log"`, `"loglog"` are supported. Alternatively, an object of class `"link-glm"` can be supplied. `link.phi` character specification of the link function in the precision model (phi). Currently, `"identity"`, `"log"`, `"sqrt"` are supported. The default is `"log"` unless `formula` is of type `y ~ x` where the default is `"identity"` (for backward compatibility). Alternatively, an object of class `"link-glm"` can be supplied. `control` a list of control arguments specified via `betareg.control`. `cluster` Either a matrix with `k` columns of initial cluster membership probabilities for each observation; or a factor or integer vector with the initial cluster assignments of observations at the start of the EM algorithm. Default is random assignment into `k` clusters. `FLXconcomitant` concomitant variable model; object of class `FLXP`. Default is the object returned by calling `FLXPconstant`. The argument `FLXconcomitant` can be omitted if `formula` is a three-part formula of type `y ~ x | z | w`, where `w` specificies the concomitant variables. `FLXcontrol` object of class `"FLXcontrol"` or a named list; controls the EM algorithm and passed in turn to the `control` argument of `flexmix`. `verbose` a logical; if `TRUE` progress information is shown for different starts of the EM algorithm. `nstart` for each value of `k` run `stepFlexmix` `nstart` times and keep only the solution with maximum likelihood. `which` number of model to get if `k` is a vector of integers longer than one. If character, interpreted as number of components or name of an information criterion. `ID` grouping variable indicating if observations are from the same individual, i.e. the component membership is restricted to be the same for these observations. `fixed` symbolic description of the model for the parameters fixed over components (of type `~ x | z`). `extra_components` a list containing objects returned by `extraComponent()`. `...` arguments passed to `betareg.control`. `type` specifies if the component follows a uniform distribution or a beta regression model. `coef` a vector with the coefficients to determine the midpoint of the uniform distribution or names list with the coefficients for the mean and precision of the beta regression model. `delta` numeric; half-length of the interval of the uniform distribution.

## Details

The arguments and the model specification are similar to `betareg`. Internally `stepFlexmix` is called with suitable arguments to fit the finite mixture model with the EM algorithm. See Grün et al. (2012) for more details.

`extra_components` is a list where each element corresponds to a component where the parameters are fixed a-priori.

## Value

An object of class `"flexmix"` containing the best model with respect to the log likelihood or the one selected according to `which` if `k` is a vector of integers longer than 1.

## Author(s)

Bettina Grün and Achim Zeileis

## References

Cribari-Neto, F., and Zeileis, A. (2010). Beta Regression in R. Journal of Statistical Software, 34(2), 1–24. http://www.jstatsoft.org/v34/i02/.

Grün, B., Kosmidis, I., and Zeileis, A. (2012). Extended Beta Regression in R: Shaken, Stirred, Mixed, and Partitioned. Journal of Statistical Software, 48(11), 1–25. http://www.jstatsoft.org/v48/i11/.

Grün, B., and Leisch, F. (2008). FlexMix Version 2: Finite Mixtures with Concomitant Variables and Varying and Constant Parameters. Journal of Statistical Software, 28(4), 1–35. http://www.jstatsoft.org/v28/i04/.

Leisch, F. (2004). FlexMix: A General Framework for Finite Mixture Models and Latent Class Regression in R. Journal of Statistical Software, 11(8), 1–18. http://www.jstatsoft.org/v11/i08/.

`betareg`, `flexmix`, `stepFlexmix`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52``` ```options(digits = 4) ## data with two groups of dyslexic and non-dyslexic children data("ReadingSkills", package = "betareg") set.seed(4040) ## try to capture accuracy ~ iq relationship (without using dyslexia ## information) using two beta regression components and one additional ## extra component for a perfect reading score rs_mix <- betamix(accuracy ~ iq, data = ReadingSkills, k = 3, nstart = 10, extra_components = extraComponent(type = "uniform", coef = 0.99, delta = 0.01)) ## visualize result ## intensities based on posterior probabilities prob <- 2 * (posterior(rs_mix)[cbind(1:nrow(ReadingSkills), clusters(rs_mix))] - 0.5) ## associated HCL colors col0 <- hcl(c(260, 0, 130), 65, 45, fixup = FALSE) col1 <- col0[clusters(rs_mix)] col2 <- hcl(c(260, 0, 130)[clusters(rs_mix)], 65 * abs(prob)^1.5, 95 - 50 * abs(prob)^1.5, fixup = FALSE) ## scatter plot plot(accuracy ~ iq, data = ReadingSkills, col = col2, pch = 19, cex = 1.5, xlim = c(-2, 2)) points(accuracy ~ iq, data = ReadingSkills, cex = 1.5, pch = 1, col = col1) ## fitted lines iq <- -30:30/10 cf <- rbind(coef(rs_mix, model = "mean", component = 1:2), c(qlogis(0.99), 0)) for(i in 1:3) lines(iq, plogis(cf[i, 1] + cf[i, 2] * iq), lwd = 2, col = col0[i]) ## refit the model including a concomitant variable model ## using the dyslexia information w <- rnorm(nrow(ReadingSkills), c(-1, 1)[as.integer(ReadingSkills\$dyslexia)]) ## The argument FLXconcomitant can be omitted when specifying ## the model via a three part formula given by ## accuracy ~ iq | 1 | w ## The posteriors from the previously fitted model are used ## for initialization. library("flexmix") rs_mix2 <- betamix(accuracy ~ iq, data = ReadingSkills, extra_components = extraComponent(type = "uniform", coef = 0.99, delta = 0.01), cluster = posterior(rs_mix), FLXconcomitant = FLXPmultinom(~w)) coef(rs_mix2, which = "concomitant") summary(rs_mix2, which = "concomitant") ```