knitr::opts_chunk$set( collapse = TRUE, fig.width = 7 ) Sys.setenv(LANG = "en")
This vignette describes the workflow of linear regression modeling in the multiverse with the following functions:
formula_branch()
, add_formula_branch
: create branches for regression formulas and add them to a mverse
object.lm_mverse()
: fit a simple linear model with the given formula branches and family branches.summary()
: provide a summary of the fitted models in different branches. spec_curve()
: display the specification curve of a model.library(mverse)
We will use the Boston housing dataset {@boston} as an example. This dataset has 506 observations on 14 variables. This dataset is extensively used in regression analyses and algorithm benchmarks. The objective is to predict the median value of a home (medv
) with the feature variables.
dplyr::glimpse(MASS::Boston) # using kable for displaying data in html
mverse
In order to perform a linear regression in the multiverse, we create a formula branch with all the models we wish to explore, add it the mverse
object, and execute lm
on each universe by calling lm_mverse
.
Create a multiverse with mverse
.
mv <- create_multiverse(MASS::Boston)
We can explore models of the median value of home prices medv
on different combinations of the following explanatory variables: proportion of adults without some high school education and proportion of male workers classified as laborers (lstat
), average number of rooms per dwelling (rm
), per capita crime rate (crim
), and property tax (tax
).
Create the models with formula_branch()
formulas <- formula_branch(medv ~ log(lstat) * rm, medv ~ log(lstat) * tax, medv ~ log(lstat) * tax * rm)
Add the models to the multiverse mv
.
mv <- mv |> add_formula_branch(formulas)
Fit lm()
across mv
using lm_mverse()
.
lm_mverse(mv)
By default, summary
will give the estimates of parameters for each model. You can also output other information by changing the output
parameter.
summary(mv)
Changing output
to df
yields the degrees of freedom table.
summary(mv, output = "df")
Other options include F (output = "f"
) statistics
summary(mv, output = "f")
and $R^2$ (output = "r"
).
# output R-squared by `r.squared` or "r" summary(mv, output = "r")
Finally, we can display how the effect of number of rooms in a dwelling log(lstat)
using spec_curve
.
spec_summary(mv, var = "log(lstat)") |> spec_curve(label = "code") + ggplot2::labs("Significant at 0.05")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.