# stan_lm: Bayesian regularized linear models via Stan In rstanarm: Bayesian Applied Regression Modeling via Stan

## Description

Bayesian inference for linear modeling with regularizing priors on the model parameters that are driven by prior beliefs about R^2, the proportion of variance in the outcome attributable to the predictors. See `priors` for an explanation of this critical point. `stan_glm` with `family="gaussian"` also estimates a linear model with normally-distributed errors and allows for various other priors on the coefficients.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58``` ```stan_aov( formula, data, projections = FALSE, contrasts = NULL, ..., prior = R2(stop("'location' must be specified")), prior_PD = FALSE, algorithm = c("sampling", "meanfield", "fullrank"), adapt_delta = NULL ) stan_lm( formula, data, subset, weights, na.action, model = TRUE, x = FALSE, y = FALSE, singular.ok = TRUE, contrasts = NULL, offset, ..., prior = R2(stop("'location' must be specified")), prior_intercept = NULL, prior_PD = FALSE, algorithm = c("sampling", "meanfield", "fullrank"), adapt_delta = NULL ) stan_lm.wfit( x, y, w, offset = NULL, singular.ok = TRUE, ..., prior = R2(stop("'location' must be specified")), prior_intercept = NULL, prior_PD = FALSE, algorithm = c("sampling", "meanfield", "fullrank"), adapt_delta = NULL ) stan_lm.fit( x, y, offset = NULL, singular.ok = TRUE, ..., prior = R2(stop("'location' must be specified")), prior_intercept = NULL, prior_PD = FALSE, algorithm = c("sampling", "meanfield", "fullrank"), adapt_delta = NULL ) ```

## Arguments

 `formula, data, subset` Same as `lm`, but we strongly advise against omitting the `data` argument. Unless `data` is specified (and is a data frame) many post-estimation functions (including `update`, `loo`, `kfold`) are not guaranteed to work properly. `projections` For `stan_aov`, a logical scalar (defaulting to `FALSE`) indicating whether `proj` should be called on the fit. `...` Further arguments passed to the function in the rstan package (`sampling`, `vb`, or `optimizing`), corresponding to the estimation method named by `algorithm`. For example, if `algorithm` is `"sampling"` it is possibly to specify `iter`, `chains`, `cores`, `refresh`, etc. `prior` Must be a call to `R2` with its `location` argument specified or `NULL`, which would indicate a standard uniform prior for the R^2. `prior_PD` A logical scalar (defaulting to `FALSE`) indicating whether to draw from the prior predictive distribution instead of conditioning on the outcome. `algorithm` A string (possibly abbreviated) indicating the estimation approach to use. Can be `"sampling"` for MCMC (the default), `"optimizing"` for optimization, `"meanfield"` for variational inference with independent normal distributions, or `"fullrank"` for variational inference with a multivariate normal distribution. See `rstanarm-package` for more details on the estimation algorithms. NOTE: not all fitting functions support all four algorithms. `adapt_delta` Only relevant if `algorithm="sampling"`. See the adapt_delta help page for details. `na.action, singular.ok, contrasts` Same as `lm`, but rarely specified. `model, offset, weights` Same as `lm`, but rarely specified. `x, y` In `stan_lm, stan_aov`, logical scalars indicating whether to return the design matrix and response vector. In `stan_lm.fit or stan_lm.wfit`, a design matrix and response vector. `prior_intercept` Either `NULL` (the default) or a call to `normal`. If a `normal` prior is specified without a `scale`, then the standard deviation is taken to be the marginal standard deviation of the outcome divided by the square root of the sample size, which is legitimate because the marginal standard deviation of the outcome is a primitive parameter being estimated. Note: If using a dense representation of the design matrix —i.e., if the `sparse` argument is left at its default value of `FALSE`— then the prior distribution for the intercept is set so it applies to the value when all predictors are centered. If you prefer to specify a prior on the intercept without the predictors being auto-centered, then you have to omit the intercept from the `formula` and include a column of ones as a predictor, in which case some element of `prior` specifies the prior on it, rather than `prior_intercept`. Regardless of how `prior_intercept` is specified, the reported estimates of the intercept always correspond to a parameterization without centered predictors (i.e., same as in `glm`). `w` Same as in `lm.wfit` but rarely specified.

## Details

The `stan_lm` function is similar in syntax to the `lm` function but rather than choosing the parameters to minimize the sum of squared residuals, samples from the posterior distribution are drawn using MCMC (if `algorithm` is `"sampling"`). The `stan_lm` function has a formula-based interface and would usually be called by users but the `stan_lm.fit` and `stan_lm.wfit` functions might be called by other functions that parse the data themselves and are analogous to `lm.fit` and `lm.wfit` respectively.

In addition to estimating `sigma` — the standard deviation of the normally-distributed errors — this model estimates a positive parameter called `log-fit_ratio`. If it is positive, the marginal posterior variance of the outcome will exceed the sample variance of the outcome by a multiplicative factor equal to the square of `fit_ratio`. Conversely if `log-fit_ratio` is negative, then the model underfits. Given the regularizing nature of the priors, a slight underfit is good.

Finally, the posterior predictive distribution is generated with the predictors fixed at their sample means. This quantity is useful for checking convergence because it is reasonably normally distributed and a function of all the parameters in the model.

The `stan_aov` function is similar to `aov`, but does a Bayesian analysis of variance that is basically equivalent to `stan_lm` with dummy variables. `stan_aov` has a somewhat customized `print` method that prints an ANOVA-like table in addition to the output printed for `stan_lm` models.

## Value

A stanreg object is returned for `stan_lm, stan_aov`.

A stanfit object (or a slightly modified stanfit object) is returned if `stan_lm.fit or stan_lm.wfit` is called directly.

## References

Lewandowski, D., Kurowicka D., and Joe, H. (2009). Generating random correlation matrices based on vines and extended onion method. Journal of Multivariate Analysis. 100(9), 1989–2001.

The vignettes for `stan_lm` and `stan_aov`, which have more thorough descriptions and examples. http://mc-stan.org/rstanarm/articles/

Also see `stan_glm`, which — if ```family = gaussian(link="identity")``` — also estimates a linear model with normally-distributed errors but specifies different priors.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12``` ```op <- options(contrasts = c("contr.helmert", "contr.poly")) fit_aov <- stan_aov(yield ~ block + N*P*K, data = npk, prior = R2(0.5), seed = 12345) options(op) print(fit_aov) (fit <- stan_lm(mpg ~ wt + qsec + am, data = mtcars, prior = R2(0.75), # the next line is only to make the example go fast enough chains = 1, iter = 300, seed = 12345, refresh = 0)) plot(fit, "hist", pars = c("wt", "am", "qsec", "sigma"), transformations = list(sigma = "log")) ```

rstanarm documentation built on Feb. 11, 2020, 5:06 p.m.