betareg: Beta Regression for Rates and Proportions In betareg: Beta Regression

Description

Fit beta regression models for rates and proportions via maximum likelihood using a parametrization with mean (depending through a link function on the covariates) and precision parameter (called phi).

Usage

 ``` 1 2 3 4 5 6 7 8 9 10``` ```betareg(formula, data, subset, na.action, weights, offset, link = c("logit", "probit", "cloglog", "cauchit", "log", "loglog"), link.phi = NULL, type = c("ML", "BC", "BR"), dist = c("beta", "cbeta4", "cbetax"), nu = NULL, control = betareg.control(...), model = TRUE, y = TRUE, x = FALSE, ...) betareg.fit(x, y, z = NULL, weights = NULL, offset = NULL, link = "logit", link.phi = "log", type = "ML", control = betareg.control(), dist = "beta", nu = NULL) ```

Arguments

 `formula` symbolic description of the model (of type `y ~ x` or `y ~ x | z`; for details see below). `data, subset, na.action` arguments controlling formula processing via `model.frame`. `weights` optional numeric vector of case weights. `offset` optional numeric vector with an a priori known component to be included in the linear predictor for the mean. In `betareg.fit`, `offset` may also be a list of two offsets for the mean and precision equation, respectively. `link` character specification of the link function in the mean model (mu). Currently, `"logit"`, `"probit"`, `"cloglog"`, `"cauchit"`, `"log"`, `"loglog"` are supported. Alternatively, an object of class `"link-glm"` can be supplied. `link.phi` character specification of the link function in the precision model (phi). Currently, `"identity"`, `"log"`, `"sqrt"` are supported. The default is `"log"` unless `formula` is of type `y ~ x` where the default is `"identity"` (for backward compatibility). Alternatively, an object of class `"link-glm"` can be supplied. `type` character specification of the type of estimator. Currently, maximum likelihood (`"ML"`), ML with bias correction (`"BC"`), and ML with bias reduction (`"BR"`) are supported. `dist` character specification of the response distribution. If all response observation are in (0, 1) the default is the standard beta distribution. In the presence of boundary 0/1 observations, a censored constrained four-parameter beta distribution (`"cbeta4"`) or a censored exponential mixture beta distribution (`"cbetax"`) can be used. `nu` numeric. The value of the (fixed) `nu` parameter in case a censored `dist` is used. The default is to estimate `nu` in case of the `"cbetax"` distribution. `control` a list of control arguments specified via `betareg.control`. `model, y, x` logicals. If `TRUE` the corresponding components of the fit (model frame, response, model matrix) are returned. For `betareg.fit`, `x` should be a numeric regressor matrix and `y` should be the numeric response vector (with values in (0,1)). `z` numeric matrix. Regressor matrix for the precision model, defaulting to an intercept only. `...` arguments passed to `betareg.control`.

Details

Beta regression as suggested by Ferrari and Cribari-Neto (2004) and extended by Simas, Barreto-Souza, and Rocha (2010) is implemented in `betareg`. It is useful in situations where the dependent variable is continuous and restricted to the unit interval (0, 1), e.g., resulting from rates or proportions. It is modeled to be beta-distributed with parametrization using mean and precision parameter (called phi). The mean is linked, as in generalized linear models (GLMs), to the responses through a link function and a linear predictor. Additionally, the precision parameter phi can be linked to another (potentially overlapping) set of regressors through a second link function, resulting in a model with variable dispersion. Estimation is performed by maximum likelihood (ML) via `optim` using analytical gradients and (by default) starting values from an auxiliary linear regression of the transformed response. Subsequently, the `optim` result may be enhanced by an additional Fisher scoring iteration using analytical gradients and expected information. This slightly improves the optimization by moving the gradients even closer to zero (for `type = "ML"` and `"BC"`) or solving the bias-adjusted estimating equations (for `type = "BR"`). For the former two estimators, the optional Fisher scoring can be disabled by setting `fsmaxit = 0` in the control arguments. See Cribari-Neto and Zeileis (2010) and Grün et al. (2012) for details.

In the beta regression as introduced by Ferrari and Cribari-Neto (2004), the mean of the response is linked to a linear predictor described by `y ~ x1 + x2` using a `link` function while the precision parameter phi is assumed to be constant. Simas et al. (2009) suggest to extend this model by linking phi to an additional set of regressors (`z1 + z2`, say): In `betareg` this can be specified in a formula of type `y ~ x1 + x2 | z1 + z2` where the regressors in the two parts can be overlapping. In the precision model (for phi), the link function `link.phi` is used. The default is a `"log"` link unless no precision model is specified. In the latter case (i.e., when the formula is of type `y ~ x1 + x2`), the `"identity"` link is used by default for backward compatibility.

Simas et al. (2009) also suggest further extensions (non-linear specificiations, bias correction) which are not yet implemented in `betareg`. However, Kosmidis and Firth (2010) discuss general algorithms for bias correction/reduction, both of which are available in `betareg` by setting the `type` argument accordingly. (Technical note: In case, either bias correction or reduction is requested, the second derivative of the inverse link function is required for `link` and `link.phi`. If the two links are specified by their names (as done by default in `betareg`), then the `"link-glm"` objects are enhanced automatically by the required additional `d2mu.deta` function. However, if a `"link-glm"` object is supplied directly by the user, it needs to have the `d2mu.deta` function or, for backward compatibility, `dmu.deta`.)

The main parameters of interest are the coefficients in the linear predictor of the mean model. The additional parameters in the precision model (phi) can either be treated as full model parameters (default) or as nuisance parameters. In the latter case the estimation does not change, only the reported information in output from `print`, `summary`, or `coef` (among others) will be different. See also `betareg.control`.

A set of standard extractor functions for fitted model objects is available for objects of class `"betareg"`, including methods to the generic functions `print`, `summary`, `plot`, `coef`, `vcov`, `logLik`, `residuals`, `predict`, `terms`, `model.frame`, `model.matrix`, `cooks.distance` and `hatvalues` (see `influence.measures`), `gleverage` (new generic), `estfun` and `bread` (from the sandwich package), and `coeftest` (from the lmtest package).

See `predict.betareg`, `residuals.betareg`, `plot.betareg`, and `summary.betareg` for more details on all methods.

The original version of the package was written by Alexandre B. Simas and Andrea V. Rocha (up to version 1.2). Starting from version 2.0-0 the code was rewritten by Achim Zeileis.

Value

`betareg` returns an object of class `"betareg"`, i.e., a list with components as follows. `betareg.fit` returns an unclassed list with components up to `converged`.

 `coefficients` a list with elements `"mean"` and `"precision"` containing the coefficients from the respective models, `residuals` a vector of raw residuals (observed - fitted), `fitted.values` a vector of fitted means, `optim` output from the `optim` call for maximizing the log-likelihood(s), `method` the method argument passed to the `optim` call, `control` the control arguments passed to the `optim` call, `start` the starting values for the parameters passed to the `optim` call, `weights` the weights used (if any), `offset` a list of offset vectors used (if any), `n` number of observations, `nobs` number of observations with non-zero weights, `df.null` residual degrees of freedom in the null model (constant mean and dispersion), i.e., `n - 2`, `df.residual` residual degrees of freedom in the fitted model, `phi` logical indicating whether the precision (phi) coefficients will be treated as full model parameters or nuisance parameters in subsequent calls to `print`, `summary`, `coef` etc., `loglik` log-likelihood of the fitted model, `vcov` covariance matrix of all parameters in the model, `pseudo.r.squared` pseudo R-squared value (squared correlation of linear predictor and link-transformed response), `link` a list with elements `"mean"` and `"precision"` containing the link objects for the respective models, `converged` logical indicating successful convergence of `optim`, `call` the original function call, `formula` the original formula, `terms` a list with elements `"mean"`, `"precision"` and `"full"` containing the terms objects for the respective models, `levels` a list with elements `"mean"`, `"precision"` and `"full"` containing the levels of the categorical regressors, `contrasts` a list with elements `"mean"` and `"precision"` containing the contrasts corresponding to `levels` from the respective models, `model` the full model frame (if `model = TRUE`), `y` the response proportion vector (if `y = TRUE`), `x` a list with elements `"mean"` and `"precision"` containing the model matrices from the respective models (if `x = TRUE`).

References

Cribari-Neto, F., and Zeileis, A. (2010). Beta Regression in R. Journal of Statistical Software, 34(2), 1–24. http://www.jstatsoft.org/v34/i02/.

Ferrari, S.L.P., and Cribari-Neto, F. (2004). Beta Regression for Modeling Rates and Proportions. Journal of Applied Statistics, 31(7), 799–815.

Grün, B., Kosmidis, I., and Zeileis, A. (2012). Extended Beta Regression in R: Shaken, Stirred, Mixed, and Partitioned. Journal of Statistical Software, 48(11), 1–25. http://www.jstatsoft.org/v48/i11/.

Kosmidis, I., and Firth, D. (2010). A Generic Algorithm for Reducing Bias in Parametric Estimation. Electronic Journal of Statistics, 4, 1097–1112.

Simas, A.B., Barreto-Souza, W., and Rocha, A.V. (2010). Improved Estimators for a General Class of Beta Regression Models. Computational Statistics & Data Analysis, 54(2), 348–366.

See Also

`summary.betareg`, `predict.betareg`, `residuals.betareg`, `Formula`

Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33``` ```options(digits = 4) ## Section 4 from Ferrari and Cribari-Neto (2004) data("GasolineYield", package = "betareg") data("FoodExpenditure", package = "betareg") ## Table 1 gy <- betareg(yield ~ batch + temp, data = GasolineYield) summary(gy) ## Table 2 fe_lin <- lm(I(food/income) ~ income + persons, data = FoodExpenditure) library("lmtest") bptest(fe_lin) fe_beta <- betareg(I(food/income) ~ income + persons, data = FoodExpenditure) summary(fe_beta) ## nested model comparisons via Wald and LR tests fe_beta2 <- betareg(I(food/income) ~ income, data = FoodExpenditure) lrtest(fe_beta, fe_beta2) waldtest(fe_beta, fe_beta2) ## Section 3 from online supplements to Simas et al. (2010) ## mean model as in gy above ## precision model with regressor temp gy2 <- betareg(yield ~ batch + temp | temp, data = GasolineYield) ## MLE column in Table 19 summary(gy2) ## LRT row in Table 18 lrtest(gy, gy2) ```

Example output

```Call:
betareg(formula = yield ~ batch + temp, data = GasolineYield)

Standardized weighted residuals 2:
Min     1Q Median     3Q    Max
-2.875 -0.815  0.160  0.838  2.048

Coefficients (mean model with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.159571   0.182325  -33.78  < 2e-16 ***
batch1       1.727729   0.101229   17.07  < 2e-16 ***
batch2       1.322597   0.117902   11.22  < 2e-16 ***
batch3       1.572310   0.116105   13.54  < 2e-16 ***
batch4       1.059714   0.102360   10.35  < 2e-16 ***
batch5       1.133752   0.103523   10.95  < 2e-16 ***
batch6       1.040162   0.106036    9.81  < 2e-16 ***
batch7       0.543692   0.109127    4.98  6.3e-07 ***
batch8       0.495901   0.108926    4.55  5.3e-06 ***
batch9       0.385793   0.118593    3.25   0.0011 **
temp         0.010967   0.000413   26.58  < 2e-16 ***

Phi coefficients (precision model with identity link):
Estimate Std. Error z value Pr(>|z|)
(phi)      440        110       4  6.3e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Type of estimator: ML (maximum likelihood)
Log-likelihood: 84.8 on 12 Df
Pseudo R-squared: 0.962
Number of iterations: 51 (BFGS) + 3 (Fisher scoring)
Loading required package: zoo

Attaching package: 'zoo'

The following objects are masked from 'package:base':

as.Date, as.Date.numeric

studentized Breusch-Pagan test

data:  fe_lin
BP = 5.9, df = 2, p-value = 0.05

Call:
betareg(formula = I(food/income) ~ income + persons, data = FoodExpenditure)

Standardized weighted residuals 2:
Min     1Q Median     3Q    Max
-2.782 -0.444  0.202  0.685  1.876

Coefficients (mean model with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.62255    0.22385   -2.78   0.0054 **
income      -0.01230    0.00304   -4.05  5.1e-05 ***
persons      0.11846    0.03534    3.35   0.0008 ***

Phi coefficients (precision model with identity link):
Estimate Std. Error z value Pr(>|z|)
(phi)    35.61       8.08    4.41    1e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Type of estimator: ML (maximum likelihood)
Log-likelihood: 45.3 on 4 Df
Pseudo R-squared: 0.388
Number of iterations: 28 (BFGS) + 4 (Fisher scoring)
Likelihood ratio test

Model 1: I(food/income) ~ income + persons
Model 2: I(food/income) ~ income
#Df LogLik Df Chisq Pr(>Chisq)
1   4   45.3
2   3   40.5 -1  9.65     0.0019 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Wald test

Model 1: I(food/income) ~ income + persons
Model 2: I(food/income) ~ income
Res.Df Df Chisq Pr(>Chisq)
1     34
2     35 -1  11.2      8e-04 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Call:
betareg(formula = yield ~ batch + temp | temp, data = GasolineYield)

Standardized weighted residuals 2:
Min     1Q Median     3Q    Max
-2.540 -0.779 -0.117  0.862  2.942

Coefficients (mean model with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.923236   0.183526  -32.27  < 2e-16 ***
batch1       1.601988   0.063856   25.09  < 2e-16 ***
batch2       1.297266   0.099100   13.09  < 2e-16 ***
batch3       1.565338   0.099739   15.69  < 2e-16 ***
batch4       1.030072   0.063288   16.28  < 2e-16 ***
batch5       1.154163   0.065643   17.58  < 2e-16 ***
batch6       1.019445   0.066351   15.36  < 2e-16 ***
batch7       0.622259   0.065632    9.48  < 2e-16 ***
batch8       0.564583   0.060185    9.38  < 2e-16 ***
batch9       0.359439   0.067141    5.35  8.6e-08 ***
temp         0.010359   0.000436   23.75  < 2e-16 ***

Phi coefficients (precision model with log link):
Estimate Std. Error z value Pr(>|z|)
(Intercept)  1.36409    1.22578    1.11     0.27
temp         0.01457    0.00362    4.03  5.7e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Type of estimator: ML (maximum likelihood)
Log-likelihood:   87 on 13 Df
Pseudo R-squared: 0.952
Number of iterations: 33 (BFGS) + 28 (Fisher scoring)
Likelihood ratio test

Model 1: yield ~ batch + temp
Model 2: yield ~ batch + temp | temp
#Df LogLik Df Chisq Pr(>Chisq)
1  12   84.8
2  13   87.0  1  4.36      0.037 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

betareg documentation built on May 31, 2017, 3:12 a.m.