README.md

tidystan R package

Warning: this package is currently under development and does not yet have a stable API.

This is an R package for those that both love Stan and tidy data. tidystan transforms the posterior output from Stan models into tidy data frames.

There are three(ish) main functions for doing this:

  1. tidy_samples() creates a tidy data frame where an observation is one draw of one parameter.

  2. tidy_samples(wide = TRUE) creates a tidy data frame where the rows are posterior samples and the columns are parameters.

  3. summarise_samples() creates a tidy data frame where an observation is one parameter and the columns give summary functions (e.g. mean and median) and confidence intervals for the parameters' posterior distributions.

A quick example

Stan models can be fit by writing your own Stan model and fitting with the rstan package or by using a package with pre-built Stan models, such as rstanarm. For brevity, we use stan_glm from rstanarm for this example.

library(rstanarm)
library(tidystan)
library(dplyr)
library(ggplot2)

model <- stan_glm(kid_score ~ mom_hs + mom_iq + mom_age,
                                        data = kidiq, chains = 1, iter = 1000)
#> 
#> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
#> 
#> Chain 1, Iteration:   1 / 1000 [  0%]  (Warmup)
#> Chain 1, Iteration: 100 / 1000 [ 10%]  (Warmup)
#> Chain 1, Iteration: 200 / 1000 [ 20%]  (Warmup)
#> Chain 1, Iteration: 300 / 1000 [ 30%]  (Warmup)
#> Chain 1, Iteration: 400 / 1000 [ 40%]  (Warmup)
#> Chain 1, Iteration: 500 / 1000 [ 50%]  (Warmup)
#> Chain 1, Iteration: 501 / 1000 [ 50%]  (Sampling)
#> Chain 1, Iteration: 600 / 1000 [ 60%]  (Sampling)
#> Chain 1, Iteration: 700 / 1000 [ 70%]  (Sampling)
#> Chain 1, Iteration: 800 / 1000 [ 80%]  (Sampling)
#> Chain 1, Iteration: 900 / 1000 [ 90%]  (Sampling)
#> Chain 1, Iteration: 1000 / 1000 [100%]  (Sampling)
#>  Elapsed Time: 0.184391 seconds (Warm-up)
#>                0.130733 seconds (Sampling)
#>                0.315124 seconds (Total)

We can use tidy_samples to get a data frame of the samples for each of the parameters. This data frame has one row for one draw of one parameter.

posterior <- tidy_samples(model$stanfit)

print(posterior)
#> # A tibble: 3,000 × 4
#>    parameter  draw     i estimate
#>        <chr> <int> <int>    <dbl>
#> 1      alpha     1     1 22.18962
#> 2      alpha     2     1 36.87980
#> 3      alpha     3     1 13.70154
#> 4      alpha     4     1 20.77083
#> 5      alpha     5     1 17.71837
#> 6      alpha     6     1 33.91377
#> 7      alpha     7     1 23.97039
#> 8      alpha     8     1 14.24262
#> 9      alpha     9     1 24.22746
#> 10     alpha    10     1 18.32278
#> # ... with 2,990 more rows

With this data frame, we can easily take the beta parameters, label them by their term name, and plot their posterior distrubtions with ggplot2.

terms <- c("mom_hs", "mom_iq", "mom_age")

coefficients <- posterior %>%
    filter(parameter == "beta") %>%
    mutate(variable = terms[i])

ggplot(coefficients, 
             aes(x = estimate, fill = variable)) + 
    geom_density(alpha = 0.4) + 
    geom_vline(aes(xintercept = 0)) + 
    theme_bw() +
    labs(title = "Posterior Distribution of Coefficients")

License

This R package is licensed under the MIT license. See the file LICENSE for more details.



wjones127/tidystan documentation built on May 28, 2017, 4:36 a.m.