There are two ways to fit a GLM model using Stan.
rstan::stan
).rstan::stan
.We will work through both examples here.
First let's select a dataset to which we want to fit our model.
data("mtcars")
Let's decide on the model we want to estimate. For now we will fit a really generic model to predict mpg
.
library(rstanarm) fit1 <- rstanarm::stan_glm(mpg / 10 ~ ., data = mtcars) print(fit1)
Here we fit the default stan model, which puts a normal(0, 1) prior on each of the covariate estimates.
In the rstanarm
framework, we can modify the priors on the covariates, to set a particular distribution of priors (e.g. student_t
or cauchy
) or to enforce greater sparsity (e.g. hs()
or hs_plus()
). See prior distribitions for a description of prior distributions available.
fit2 <- rstanarm::stan_glm(mpg / 10 ~ ., data = mtcars, prior = cauchy(), adapt_delta = 0.999, iter = 4000 ) print(fit2)
For linear models, we also have the choice of putting a more global prior on the coefficient parameters, though a neat math trick which allows us to put a prior on R2
. This reflects our overall confidence in the ability of the model to explain our outcome.
fit3 <- rstanarm::stan_lm(mpg / 10 ~ ., data = mtcars, prior = R2(location = 0.3, what = 'mode'), adapt_delta = 0.999 ) print(fit3)
At the end of the day, the decision of how to select priors can be based on substantive knowledge (from previous experiments or domain experience) or on model performance (e.g. cross-validation performance).
Now, let's assume these all sampled well & that we have good estimates for our parameters from the three fits.
First thing we might want to do is compare these posterior estimates.
library(bayesplot) plot1 <- bayesplot::mcmc_dens_overlay(as.array(fit1), facet_args = list(ncol = 1), pars = c('cyl', 'wt')) plot2 <- bayesplot::mcmc_dens_overlay(as.array(fit2), facet_args = list(ncol = 1), pars = c('cyl', 'wt')) plot3 <- bayesplot::mcmc_dens_overlay(as.array(fit3), facet_args = list(ncol = 1), pars = c('cyl', 'wt')) gridExtra::grid.arrange(plot1, plot2, ncol = 2)
These priors have very little impact on our estimated coefficient values.
loo
We can compare these models using a leave-one-out approximation.
library(loo) loo::compare(loo(fit1, k_threshold = 0.7), loo(fit2, k_threshold = 0.7), loo(fit3, k_threshold = 0.7))
When comparing 3 models, we get back a ranking of the models in terms of their predictive performance on our data (lower models of . When comparing A positive result indicates that the second model is a better fit than the first.
Note that we can also plot the loo
object.
Let's try fitting a glm
(logistic) regression model. This example is borrowed from the rstanarm vignette vignette("binomial", package = 'rstanarm')
.
data(wells) wells$dist100 <- wells$dist / 100 bfit <- stan_glm( switch ~ dist100 + arsenic, data = wells, family = binomial(link = "logit"), prior_intercept = normal(0, 10), QR = TRUE, chains = 2, iter = 2000 ) print(bfit) #prior_summary(bfit)
file_path <- file.path('vignettes','stan_glm.stan') lines <- readLines(file_path, encoding = 'ASCII') for (n in 1:length(lines)) cat(lines[n], '\n')
Stan code is broken into several blocks --
There are other blocks (functions
& generated quantities
) but we are not using them here.
In order to fit this model to our data using rstan::fit
, we first need to prepare a named list
to provide data to the stan model.
ols_estimate <- glm(mpg / 10 ~ . , data = mtcars, x = TRUE) x <- ols_estimate$x y <- ols_estimate$y stan_data <- list( N = length(y), M = ncol(x), x = x, y = as.array(y) ) str(stan_data)
Now we are ready to fit our model to our data.
stan_fit <- rstan::stan(file = file_path, data = stan_data)
The final step is to summarize the fit object.
print(stan_fit, pars = c('beta', 'sigma'))
You will notice that this summary is not "pretty" -- ie, our beta coefficients are not labelled in a pleasing way, as our results from rstanarm were.
We do, however, have access to the x-coefficient names.
ols_estimate$coefficients
And, we can still use the bayesplot
& loo
package functionality , although to use loo we will need to calculate our log-likelihood contributions for each observed value.
Here's an example of plotting our beta-estimates from this model fit.
bayesplot::mcmc_areas(as.array(stan_fit), regex_pars = 'beta.*')
It's also worth calling out the excellent tool called shinystan
.
if (interactive()) launch_shinystan(fit1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.