knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Once you have built your full specification blueprint and feel comfortable with how the pipeline is executed, you can implement a full multiverse-style analysis.
Simply use run_multiverse(<your expanded grid object>)
:
library(tidyverse) library(multitool) # create some data the_data <- data.frame( id = 1:500, iv1 = rnorm(500), iv2 = rnorm(500), iv3 = rnorm(500), mod = rnorm(500), dv1 = rnorm(500), dv2 = rnorm(500), include1 = rbinom(500, size = 1, prob = .1), include2 = sample(1:3, size = 500, replace = TRUE), include3 = rnorm(500) ) # create a pipeline blueprint full_pipeline <- the_data |> add_filters(include1 == 0, include2 != 3, include3 > -2.5) |> add_variables(var_group = "ivs", iv1, iv2, iv3) |> add_variables(var_group = "dvs", dv1, dv2) |> add_model("linear model", lm({dvs} ~ {ivs} * mod)) # expand the pipeline expanded_pipeline <- expand_decisions(full_pipeline) # Run the multiverse multiverse_results <- run_multiverse(expanded_pipeline) multiverse_results
The result will be another tibble
with various list columns.
It will always contain a list column named specifications
containing all the information you generated in your blueprint. Next, there will a list column for your fitted model fitted, labelled model_fitted
.
There are two main ways to unpack and examine multitool
results. The first is by using tidyr::unnest()
.
Inside the model_fitted
column, multitool
gives us 4 columns: model_parameters
, model_performance
, model_warnings
, and model_messages
.
multiverse_results |> unnest(model_fitted)
The model_parameters
column gives you the result of calling parameters::parameters()
on each model in your grid, which is a data.frame
of model coefficients and their associated standard errors, confidence intervals, test statistic, and p-values.
multiverse_results |> unnest(model_fitted) |> unnest(model_parameters)
The model_performance
column gives fit statistics, such as r^2^ or AIC and BIC values, computed by running performance::performance()
on each model in your grid.
multiverse_results |> unnest(model_fitted) |> unnest(model_performance)
The model_messages
and model_warnings
columns contain information provided by the modeling function. If something went wrong or you need to know something about a particular model, these columns will have captured messages and warnings printed by the modeling function.
I wrote wrappers around the tidyr::unnest()
workflow. The main function is reveal()
. Pass a multiverse results object to reveal()
and tell it which columns to grab by indicating the column name in the .what
argument:
multiverse_results |> reveal(.what = model_fitted)
If you want to get straight to a specific result you can specify a sub-list with the .which
argument:
multiverse_results |> reveal(.what = model_fitted, .which = model_parameters)
reveal_model_*
multitool
will run and save anything you put in your pipeline but most often, you will want to look at model parameters and/or performance. To that end, there are a set of convenience functions for getting at the most common multiverse results: reveal_model_parameters
, reveal_model_performance
, reveal_model_messages
, and reveal_model_warnings
.
reveal_model_parameters
unpacks the model parameters in your multiverse:
multiverse_results |> reveal_model_parameters()
reveal_model_performance
unpacks the model performance:
multiverse_results |> reveal_model_performance()
You can also choose to expand your decision grid with .unpack_specs
to see which decisions produced what result. You have two options for unpacking your decisions - wide
or long
. If you set .unpack_specs = 'wide'
, you get one column per decion variable. This is exactly the same as how your decisions appeared in your grid.
multiverse_results |> reveal_model_parameters(.unpack_specs = "wide")
If you set .unpack_specs = 'long'
, your decisions get stacked into two columns: decision_set
and alternatives
. This format is nice for plotting a particular result from a multiverse analyses per different decision alternatives.
multiverse_results |> reveal_model_performance(.unpack_specs = "long")
Unpacking specifications alongside specific results allows us to examine the effects of our pipeline decisions.
A powerful way to organize these results is to summarize a specific results column, say the r^2^ values of our model over the entire multiverse. condense()
takes a result column and summarizes it with the .how
argument, which takes a list in the form of list(<a name you pick> = <summary function>)
.
.how
will create a column named like so <column being condsensed>_<summary function name provided>
. For this case, we have r2_mean
and r2_median
.
# model performance r2 summaries multiverse_results |> reveal_model_performance() |> condense(r2, list(mean = mean, median = median)) # model parameters for our predictor of interest multiverse_results |> reveal_model_parameters() |> filter(str_detect(parameter, "iv")) |> condense(coefficient, list(mean = mean, median = median))
In the last example, we have filtered our multiverse results to look at our predictors iv*
to see what the mean and median effect was (over all combinations of decisions) on our outcomes.
However, we had three versions of our predictor and two outcomes, so combining dplyr::group_by()
with condense()
might be more informative:
multiverse_results |> reveal_model_parameters(.unpack_specs = "wide") |> filter(str_detect(parameter, "iv")) |> group_by(ivs, dvs) |> condense(coefficient, list(mean = mean, median = median))
If we were interested in all the terms of the model, we can leverage group_by
further:
multiverse_results |> reveal_model_parameters(.unpack_specs = "wide") |> group_by(parameter, dvs) |> condense(coefficient, list(mean = mean, median = median))
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.