# stan_betareg: Bayesian beta regression models via Stan In rstanarm: Bayesian Applied Regression Modeling via Stan

## Description Beta regression modeling with optional prior distributions for the coefficients, intercept, and auxiliary parameter `phi` (if applicable).

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43``` ```stan_betareg( formula, data, subset, na.action, weights, offset, link = c("logit", "probit", "cloglog", "cauchit", "log", "loglog"), link.phi = NULL, model = TRUE, y = TRUE, x = FALSE, ..., prior = normal(), prior_intercept = normal(), prior_z = normal(), prior_intercept_z = normal(), prior_phi = exponential(), prior_PD = FALSE, algorithm = c("sampling", "optimizing", "meanfield", "fullrank"), adapt_delta = NULL, QR = FALSE ) stan_betareg.fit( x, y, z = NULL, weights = rep(1, NROW(x)), offset = rep(0, NROW(x)), link = c("logit", "probit", "cloglog", "cauchit", "log", "loglog"), link.phi = NULL, ..., prior = normal(), prior_intercept = normal(), prior_z = normal(), prior_intercept_z = normal(), prior_phi = exponential(), prior_PD = FALSE, algorithm = c("sampling", "optimizing", "meanfield", "fullrank"), adapt_delta = NULL, QR = FALSE ) ```

## Arguments

`formula, data, subset`

Same as `betareg`, but we strongly advise against omitting the `data` argument. Unless `data` is specified (and is a data frame) many post-estimation functions (including `update`, `loo`, `kfold`) are not guaranteed to work properly.

`na.action`

Same as `betareg`, but rarely specified.

`link`

Character specification of the link function used in the model for mu (specified through `x`). Currently, "logit", "probit", "cloglog", "cauchit", "log", and "loglog" are supported.

`link.phi`

If applicable, character specification of the link function used in the model for `phi` (specified through `z`). Currently, "identity", "log" (default), and "sqrt" are supported. Since the "sqrt" link function is known to be unstable, it is advisable to specify a different link function (or to model `phi` as a scalar parameter instead of via a linear predictor by excluding `z` from the `formula` and excluding `link.phi`).

`model, offset, weights`

Same as `betareg`.

`x, y`

In `stan_betareg`, logical scalars indicating whether to return the design matrix and response vector. In `stan_betareg.fit`, a design matrix and response vector.

`...`

Further arguments passed to the function in the rstan package (`sampling`, `vb`, or `optimizing`), corresponding to the estimation method named by `algorithm`. For example, if `algorithm` is `"sampling"` it is possibly to specify `iter`, `chains`, `cores`, `refresh`, etc.

`prior`

The prior distribution for the regression coefficients. `prior` should be a call to one of the various functions provided by rstanarm for specifying priors. The subset of these functions that can be used for the prior on the coefficients can be grouped into several "families":

 Family Functions Student t family `normal`, `student_t`, `cauchy` Hierarchical shrinkage family `hs`, `hs_plus` Laplace family `laplace`, `lasso` Product normal family `product_normal`

See the priors help page for details on the families and how to specify the arguments for all of the functions in the table above. To omit a prior —i.e., to use a flat (improper) uniform prior— `prior` can be set to `NULL`, although this is rarely a good idea.

Note: Unless `QR=TRUE`, if `prior` is from the Student t family or Laplace family, and if the `autoscale` argument to the function used to specify the prior (e.g. `normal`) is left at its default and recommended value of `TRUE`, then the default or user-specified prior scale(s) may be adjusted internally based on the scales of the predictors. See the priors help page and the Prior Distributions vignette for details on the rescaling and the `prior_summary` function for a summary of the priors used for a particular model.

`prior_intercept`

The prior distribution for the intercept. `prior_intercept` can be a call to `normal`, `student_t` or `cauchy`. See the priors help page for details on these functions. To omit a prior on the intercept —i.e., to use a flat (improper) uniform prior— `prior_intercept` can be set to `NULL`.

Note: If using a dense representation of the design matrix —i.e., if the `sparse` argument is left at its default value of `FALSE`— then the prior distribution for the intercept is set so it applies to the value when all predictors are centered. If you prefer to specify a prior on the intercept without the predictors being auto-centered, then you have to omit the intercept from the `formula` and include a column of ones as a predictor, in which case some element of `prior` specifies the prior on it, rather than `prior_intercept`. Regardless of how `prior_intercept` is specified, the reported estimates of the intercept always correspond to a parameterization without centered predictors (i.e., same as in `glm`).

`prior_z`

Prior distribution for the coefficients in the model for `phi` (if applicable). Same options as for `prior`.

`prior_intercept_z`

Prior distribution for the intercept in the model for `phi` (if applicable). Same options as for `prior_intercept`.

`prior_phi`

The prior distribution for `phi` if it is not modeled as a function of predictors. If `z` variables are specified then `prior_phi` is ignored and `prior_intercept_z` and `prior_z` are used to specify the priors on the intercept and coefficients in the model for `phi`. When applicable, `prior_phi` can be a call to `exponential` to use an exponential distribution, or one of `normal`, `student_t` or `cauchy` to use half-normal, half-t, or half-Cauchy prior. See `priors` for details on these functions. To omit a prior —i.e., to use a flat (improper) uniform prior— set `prior_phi` to `NULL`.

`prior_PD`

A logical scalar (defaulting to `FALSE`) indicating whether to draw from the prior predictive distribution instead of conditioning on the outcome.

`algorithm`

A string (possibly abbreviated) indicating the estimation approach to use. Can be `"sampling"` for MCMC (the default), `"optimizing"` for optimization, `"meanfield"` for variational inference with independent normal distributions, or `"fullrank"` for variational inference with a multivariate normal distribution. See `rstanarm-package` for more details on the estimation algorithms. NOTE: not all fitting functions support all four algorithms.

`adapt_delta`

Only relevant if `algorithm="sampling"`. See the adapt_delta help page for details.

`QR`

A logical scalar defaulting to `FALSE`, but if `TRUE` applies a scaled `qr` decomposition to the design matrix. The transformation does not change the likelihood of the data but is recommended for computational reasons when there are multiple predictors. See the QR-argument documentation page for details on how rstanarm does the transformation and important information about how to interpret the prior distributions of the model parameters when using `QR=TRUE`.

`z`

For `stan_betareg.fit`, a regressor matrix for `phi`. Defaults to an intercept only.

## Details

The `stan_betareg` function is similar in syntax to `betareg` but rather than performing maximum likelihood estimation, full Bayesian estimation is performed (if `algorithm` is `"sampling"`) via MCMC. The Bayesian model adds priors (independent by default) on the coefficients of the beta regression model. The `stan_betareg` function calls the workhorse `stan_betareg.fit` function, but it is also possible to call the latter directly.

## Value

A stanreg object is returned for `stan_betareg`.

A stanfit object (or a slightly modified stanfit object) is returned if `stan_betareg.fit` is called directly.

## References

Ferrari, SLP and Cribari-Neto, F (2004). Beta regression for modeling rates and proportions. Journal of Applied Statistics. 31(7), 799–815.

`stanreg-methods` and `betareg`.
The vignette for `stan_betareg`. http://mc-stan.org/rstanarm/articles/
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17``` ```### Simulated data N <- 200 x <- rnorm(N, 2, 1) z <- rnorm(N, 2, 1) mu <- binomial(link = "logit")\$linkinv(1 + 0.2*x) phi <- exp(1.5 + 0.4*z) y <- rbeta(N, mu * phi, (1 - mu) * phi) hist(y, col = "dark grey", border = FALSE, xlim = c(0,1)) fake_dat <- data.frame(y, x, z) fit <- stan_betareg( y ~ x | z, data = fake_dat, link = "logit", link.phi = "log", algorithm = "optimizing" # just for speed of example ) print(fit, digits = 2) ```