spread_rvars: Extract draws from a Bayesian model into tidy data frames of...

View source: R/spread_rvars.R

gather_rvarsR Documentation

Extract draws from a Bayesian model into tidy data frames of random variables

Description

Extract draws from a Bayesian model for one or more variables (possibly with named dimensions) into one of two types of long-format data frames of posterior::rvar objects.

Usage

gather_rvars(model, ..., ndraws = NULL, seed = NULL)

spread_rvars(model, ..., ndraws = NULL, seed = NULL)

Arguments

model

A supported Bayesian model fit. Tidybayes supports a variety of model objects; for a full list of supported models, see tidybayes-models.

...

Expressions in the form of variable_name[dimension_1, dimension_2, ...]. See Details.

ndraws

The number of draws to return, or NULL to return all draws.

seed

A seed to use when subsampling draws (i.e. when ndraws is not NULL).

Details

Imagine a JAGS or Stan fit named model. The model may contain a variable named b[i,v] (in the JAGS or Stan language) with dimension i in 1:100 and dimension v in 1:3. However, the default format for draws returned from JAGS or Stan in R will not reflect this indexing structure, instead they will have multiple columns with names like "b[1,1]", "b[2,1]", etc.

spread_rvars and gather_rvars provide a straightforward syntax to translate these columns back into properly-indexed rvars in two different tidy data frame formats, optionally recovering dimension types (e.g. factor levels) as it does so.

spread_rvars will spread names of variables in the model across the data frame as column names, whereas gather_rvars will gather variable names into a single column named ".variable" and place values of variables into a column named ".value". To use naming schemes from other packages (such as broom), consider passing results through functions like to_broom_names() or to_ggmcmc_names().

For example, spread_rvars(model, a[i], b[i,v]) might return a data frame with:

  • column "i": value in 1:5

  • column "v": value in 1:10

  • column "a": rvar containing draws from "a[i]"

  • column "b": rvar containing draws from "b[i,v]"

gather_rvars(model, a[i], b[i,v]) on the same model would return a data frame with:

  • column "i": value in 1:5

  • column "v": value in 1:10, or NA on rows where ".variable" is "a".

  • column ".variable": value in c("a", "b").

  • column ".value": rvar containing draws from "a[i]" (when ".variable" is "a") or "b[i,v]" (when ".variable" is "b")

spread_rvars and gather_rvars can use type information applied to the model object by recover_types() to convert columns back into their original types. This is particularly helpful if some of the dimensions in your model were originally factors. For example, if the v dimension in the original data frame data was a factor with levels c("a","b","c"), then we could use recover_types before spread_rvars:

model %>%
 recover_types(data) 
 spread_rvars(model, b[i,v])

Which would return the same data frame as above, except the "v" column would be a value in c("a","b","c") instead of 1:3.

For variables that do not share the same subscripts (or share some but not all subscripts), we can supply their specifications separately. For example, if we have a variable d[i] with the same i subscript as b[i,v], and a variable x with no subscripts, we could do this:

spread_rvars(model, x, d[i], b[i,v])

Which is roughly equivalent to this:

spread_rvars(model, x) %>%
 inner_join(spread_rvars(model, d[i])) %>%
 inner_join(spread_rvars(model, b[i,v]))

Similarly, this:

gather_rvars(model, x, d[i], b[i,v])

Is roughly equivalent to this:

bind_rows(
 gather_rvars(model, x),
 gather_rvars(model, d[i]),
 gather_rvars(model, b[i,v])
)

The c and cbind functions can be used to combine multiple variable names that have the same dimensions. For example, if we have several variables with the same subscripts i and v, we could do either of these:

spread_rvars(model, c(w, x, y, z)[i,v])
spread_rvars(model, cbind(w, x, y, z)[i,v])  # equivalent

Each of which is roughly equivalent to this:

spread_rvars(model, w[i,v], x[i,v], y[i,v], z[i,v])

Besides being more compact, the c()-style syntax is currently also slightly faster (though that may change).

Dimensions can be left nested in the resulting rvar objects by leaving their names blank; e.g. spread_rvars(model, b[i,]) will place the first index (i) into rows of the data frame but leave the second index nested in the b column (see Examples below).

Value

A data frame.

Author(s)

Matthew Kay

See Also

spread_draws(), recover_types(), compose_data(). See also posterior::rvar() and posterior::as_draws_rvars(), the functions that power spread_rvars and gather_rvars.

Examples


library(dplyr)

data(RankCorr, package = "ggdist")

RankCorr %>%
  spread_rvars(b[i, j])

# leaving an index out nests the index in the column containing the rvar
RankCorr %>%
  spread_rvars(b[i, ])

RankCorr %>%
  spread_rvars(b[i, j], tau[i], u_tau[i])

# gather_rvars places variables and values in a longer format data frame
RankCorr %>%
  gather_rvars(b[i, j], tau[i], typical_r)


mjskay/tidybayes documentation built on April 24, 2024, 11:04 p.m.