library(learnr) knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE)
Imagine you've ran several models. You:
You have 8 models in total, stored as a list. We should add more details to each
model to differentiate them from each other. Use map()
from purrr
to wrangle
each model simultaneously. The new code here, term[2]
, selects the main
predictor, which is the second element in the term
column.
capture.output({ data("unadjusted_models_list", package = "acdcourse") data("adjusted_models_list", package = "acdcourse") library(dplyr) library(purrr) }, file = tempfile())
Instructions:
map()
, add a model
column to indicate the models are "Unadjusted"
.# Add predictor and model type to each list item unadjusted_models_list <- ___( unadjusted_models_list, # .x is purrr for "model goes here" ~mutate(.x, # This selects predictor, not confounder predictor = term[2], # Indicate model "adjustment" model = ___ ))
'Add the adjustment column using `model = "Unadjusted"`.'
# Add predictor and model type to each list item unadjusted_models_list <- map( unadjusted_models_list, # .x is purrr for "model goes here" ~mutate(.x, # This selects predictor, not confounder predictor = term[2], # Indicate model "adjustment" model = "Unadjusted" ))
"Nice!"
Instructions:
"Adjusted"
model list.# Add predictor and model type to each list item adjusted_models_list <- map( adjusted_models_list, # .x is purrr for "model goes here" ~mutate(.x, # This selects predictor, not confounder predictor = term[2], # Indicate model "adjustment" model = ___ ))
"Refer to each list object in `map` with `.x`." "Include the `~` before `mutate`."
# Add predictor and model type to each list item adjusted_models_list <- map( adjusted_models_list, # .x is purrr for "model goes here" ~mutate(.x, # This selects predictor, not confounder predictor = term[2], # Indicate model "adjustment" model = "Adjusted" ))
"Excellent! You made use of map to add more details to each model object."
The most efficient approach to later plotting and creating tables is to have all models in a single data frame. You've already prepared them a bit, now it's time to combine them together so you can continue working with them.
Instructions:
bind_rows()
to combine unadjusted_models_list
and adjusted_models_list
.outcome
column with the value "got_cvd"
.filter()
to keep only conditions where the predictor
equals
term
and effect
equals "fixed"
.capture.output({ data("unadjusted_models_list", package = "acdcourse") data("adjusted_models_list", package = "acdcourse") library(dplyr) library(purrr) unadjusted_models_list <- map( unadjusted_models_list, ~mutate(.x, predictor = term[2], model = "Unadjusted") ) adjusted_models_list <- map( adjusted_models_list, ~mutate(.x, predictor = term[2], model = "Adjusted") ) }, file = tempfile())
all_models <- bind_rows( # Combine the two lists of models ___, ___ ) %>% mutate(outcome = ___) %>% # Keep only predictor rows and fixed effects filter(___ == , ___ == ___)
"Filter when predictor and term are the same (`predictor == term`)."
all_models <- bind_rows( # Combine the two lists of models unadjusted_models_list, adjusted_models_list ) %>% mutate(outcome = "got_cvd") %>% # Keep only predictor rows and fixed effects filter(predictor == term, effect == "fixed")
"Well done! You've now bound all models together and continued wrangling them."
Statistical analysis used on cohort data usually output some time of regression estimate along with a measure of uncertainty (e.g. 95% confidence interval). Sometimes it makes sense to present these results in a table, but often the better approach is to create a figure instead. Figures show magnitude, direction, uncertainty, and comparison of results very effectively.
Create a plot of the unadjusted model results that highlights the estimate and uncertainty of the estimate.
capture.output({ data("all_models", package = "acdcourse") library(dplyr) library(ggplot2) }, file = tempfile())
Instructions:
model
that is equal to "Unadjusted"
.# Keep only unadjusted models unadjusted_results <- all_models %>% filter(___ == ___)
'Filter should be `model == "Unadjusted"`.'
# Keep only unadjusted models unadjusted_results <- all_models %>% filter(model == "Unadjusted")
"Great!"
Instructions:
y
as predictor
, x
as estimate
, xmin
as conf.low
, and xmax
as
conf.high
.geom_point()
and geom_errorbarh()
layers.unadjusted_results <- all_models %>% filter(model == "Unadjusted") # Create a dot and error bar plot unadjusted_results %>% ggplot(aes(y = ___, x = ___, xmin = ___, xmax = ___)) + ___() + ___()
"The `errorbarh` (horizontal) requires `xmin = conf.low, xmax = conf.high`."
unadjusted_results <- all_models %>% filter(model == "Unadjusted") # Create a dot and error bar plot unadjusted_results %>% ggplot(aes(y = predictor, x = estimate, xmin = conf.low, xmax = conf.high)) + geom_point() + geom_errorbarh()
"Great!"
Instructions:
geom_vline()
to add a vertical line, setting xintercept
as 1 for the
"center line".unadjusted_results <- all_models %>% filter(model == "Unadjusted") # Create a dot and error bar plot unadjusted_results %>% ggplot(aes(y = predictor, x = estimate, xmin = conf.low, xmax = conf.high)) + geom_point() + geom_errorbarh() + # Add vertical line ___(xintercept = ___)
"Set `xintercept = 1` for the center line."
unadjusted_results <- all_models %>% filter(model == "Unadjusted") # Create a dot and error bar plot unadjusted_results %>% ggplot(aes(y = predictor, x = estimate, xmin = conf.low, xmax = conf.high)) + geom_point() + geom_errorbarh() + # Add vertical line geom_vline(xintercept = 1)
"Excellent! See how this graph shows the uncertainty around individual model estimates? This is an effective way of presenting results from models."
Now that we've created this plot, let's polish it up. We want it to be "publication quality", since we'll eventually present this figure to others.
As with the previous exercise, use the unadjusted_results
data frame you
created to plot the findings. This time, make the plot more polished and
presentable.
Instructions:
size
to 3, the error bar height
to 0.1, and the linetype
to "dotted"
."Predictors"
on the y and "Odds Ratio (95%
CI)"
on the x). Recall that CI means confidence interval.theme_classic()
.capture.output({ data("all_models", package = "acdcourse") library(dplyr) library(ggplot2) unadjusted_results <- all_models %>% filter(model == "Unadjusted") }, file = tempfile())
# Make the plot more polished unadjusted_results %>% ggplot(aes(y = predictor, x = estimate, xmin = conf.low, xmax = conf.high)) + geom_point(size = ___) + geom_errorbarh(height = ___) + geom_vline(xintercept = 1, linetype = ___) + labs(y = ___, x = ___) + # Set the theme ___()
'The labels should be of the form `x = "Axis Label"` (e.g. for the x-axis).'
# Make the plot more polished unadjusted_results %>% ggplot(aes(y = predictor, x = estimate, xmin = conf.low, xmax = conf.high)) + geom_point(size = 3) + geom_errorbarh(height = 0.1) + geom_vline(xintercept = 1, linetype = "dotted") + labs(y = "Predictors", x = "Odds ratio (95% CI)") + # Set the theme theme_classic()
"Amazing! You have a very nice figure now that is ready to be presented to others! There are other themes to use if you don't like this one. Check out the package ggthemes for more."
The STROBE best practices indicate to show both "crude" (unadjusted) and adjusted model results. Showing both can be informative and insightful into the research question. Create a plot of your results showing both unadjusted and adjusted models. Do the same steps as in the previous exercise for creating the plot.
Instructions:
all_models
data frame this time.facet_grid()
to split the plot by rows
with the model
variable (called inside vars()
).capture.output({ data("all_models", package = "acdcourse") library(ggplot2) library(magrittr) }, file = tempfile())
# Show results of both adjusted and unadjusted ___ %>% ggplot(aes(y = predictor, x = estimate, xmin = conf.low, xmax = conf.high)) + geom_point(size = 3) + geom_errorbarh(height = 0.1) + geom_vline(xintercept = 1, linetype = "dotted") + # Facet plot by model ___(rows = vars(___)) + labs(y = "Predictors", x = "Odds ratio (95% CI)") + theme_classic()
"The faceting variable should be called as `vars(model)`."
# Show results of both adjusted and unadjusted all_models %>% ggplot(aes(y = predictor, x = estimate, xmin = conf.low, xmax = conf.high)) + geom_point(size = 3) + geom_errorbarh(height = 0.1) + geom_vline(xintercept = 1, linetype = "dotted") + # Split plot by model facet_grid(rows = vars(model)) + labs(y = "Predictors", x = "Odds ratio (95% CI)") + theme_classic()
"Great job! You can see that there are large differences in some of the results between unadjusted and adjusted models."
A classic use for tables is showing the basic characteristics of a cohort dataset, as there are diverse data types and summary statistics that need to be shown. Including a basic participant characteristics table is part of the STROBE best practices. This table can be quite informative for others when they interpret your analysis results.
Using the carpenter
package, create a table showing summary statistics for
each data collection visit.
capture.output({ data("tidied_framingham", package = "acdcourse") library(carpenter) library(dplyr) }, file = tempfile())
Instructions:
"followup_visit_number"
for the header
argument of outline_table()
.# Create a table of summary statistics tidied_framingham %>% # These discrete variables are numeric, but must be factors mutate_at(vars(followup_visit_number, got_cvd), as.factor) %>% # Set followup visit number as table column outline_table(header = ___)
"Use quotes around `followup_visit_number`."
# Create a table of summary statistics tidied_framingham %>% # These discrete variables are numeric, but must be factors mutate_at(vars(followup_visit_number, got_cvd), as.factor) %>% # Set followup visit number as table column outline_table(header = "followup_visit_number")
"Nice!"
Instructions:
"got_cvd"
, "sex"
, and "education_combined"
variables,
using stat_nPct
for the stat
argument.# Create a table of summary statistics tidied_framingham %>% mutate_at(vars(followup_visit_number, got_cvd), as.factor) %>% outline_table(header = "followup_visit_number") %>% # Show n (%) for discrete variables as rows add_rows(c(___, ___, ___), stat = ___)
"The variables should be quoted, e.g. `"sex"`."
# Create a table of summary statistics tidied_framingham %>% mutate_at(vars(followup_visit_number, got_cvd), as.factor) %>% outline_table(header = "followup_visit_number") %>% # Show n (%) for discrete variables as rows add_rows(c("got_cvd", "sex", "education_combined"), stat = stat_nPct)
"Great!"
Instructions:
"total_cholesterol"
, "body_mass_index"
, and
"participant_age"
using stat_medianIQR
for the stat
argument.# Create a table of summary statistics tidied_framingham %>% mutate_at(vars(followup_visit_number, got_cvd), as.factor) %>% outline_table(header = "followup_visit_number") %>% add_rows(c("got_cvd", "sex", "education_combined"), stat = stat_nPct) %>% # Show median (range) for continuous variables add_rows(c(___, ___, ___), stat = ___)
"The variables need to be surrounded by quotes, just like the function above."
# Create a table of summary statistics tidied_framingham %>% mutate_at(vars(followup_visit_number, got_cvd), as.factor) %>% outline_table(header = "followup_visit_number") %>% add_rows(c("got_cvd", "sex", "education_combined"), stat = stat_nPct) %>% # Show median (range) for continuous variables add_rows(c("total_cholesterol", "body_mass_index", "participant_age"), stat = stat_medianIQR)
"Great!"
Instructions:
"header"
to "Measures"
, "Baseline"
, "Second
followup"
, and "Third followup"
, then build_table()
to markdown format.tidied_framingham %>% mutate_at(vars(followup_visit_number, got_cvd), as.factor) %>% outline_table(header = "followup_visit_number") %>% add_rows(c("got_cvd", "sex", "education_combined"), stat = stat_nPct) %>% add_rows(c("participant_age", "body_mass_index", "total_cholesterol"), stat = stat_medianIQR) %>% # Rename headers to better titles renaming("header", c(___, ___, ___, ___)) %>% # Build the table and convert to markdown form ___()
"The new column headers should be given as a character vector."
tidied_framingham %>% mutate_at(vars(followup_visit_number, got_cvd), as.factor) %>% outline_table(header = "followup_visit_number") %>% add_rows(c("got_cvd", "sex", "education_combined"), stat = stat_nPct) %>% add_rows(c("participant_age", "body_mass_index", "total_cholesterol"), stat = stat_medianIQR) %>% # Rename headers to better titles renaming("header", c("Measures", "Baseline", "Second followup", "Third followup")) %>% # Build the table and convert to markdown form build_table()
"Nice job! You've gotten the data formatted as a table for easy inclusion in a document or report and have provided basic participant characteristics from each cohort visit."
While the main messaging and presentation of results should emphasize figures over tables, often it is useful to other researchers (especially those doing meta-analyses or aggregating results) that the raw model results be given as well. Here we can use tables to give this data, as a supplement to the figure.
Provide the estimates and confidence intervals of the unadjusted and adjusted
model results in a table format that you could include in a document or report.
The packages glue
, stringr
, and knitr
have been loaded.
capture.output({ data("all_models", package = "acdcourse") library(knitr) library(glue) library(dplyr) library(tidyr) library(stringr) }, file = tempfile())
Instructions:
estimate
, conf.low
, and conf.high
to 2 digits using the
function round
(don't use with ()
).# Prepare the results for the table table_model_results <- all_models %>% # Round values of variables to 3 mutate_at(vars(___, ___, ___), ___, digits = ___)
"`mutate_at()` takes variables (as the first argument) inside `vars()` and applies a function like `round` as the second argument."
# Prepare the results for the table table_model_results <- all_models %>% # Round values of variables to 3 mutate_at(vars(estimate, conf.low, conf.high), round, digits = 2)
"Great!"
Instructions:
glue()
to create a new variable in the form estimate (conf.low,
conf.high)
, then replace underscores with spaces in predictor
with
str_replace_all()
.# Prepare the results for the table table_model_results <- all_models %>% mutate_at(vars(estimate, conf.low, conf.high), round, digits = 2) %>% # Use glue function to combine variables mutate(estimate_ci = glue("{___} ({___}, {___})"), # Underscores to spaces in predictor predictor = str_replace_all(___, "_", " "))
"The estimate and CI variables should be placed inside the `{}` in `glue()`."
# Prepare the results for the table table_model_results <- all_models %>% mutate_at(vars(estimate, conf.low, conf.high), round, digits = 2) %>% # Use glue function to combine variables mutate(estimate_ci = glue("{estimate} ({conf.low}, {conf.high})"), # Underscores to spaces in predictor predictor = str_replace_all(predictor, "_", " "))
"Amazing!"
Instructions:
model
, predictor
, and estimate_ci
variables, use spread
on
model
and estimate_ci
.kable()
.# Prepare the results for the table all_models %>% mutate_at(vars(estimate, conf.low, conf.high), round, digits = 2) %>% mutate(estimate_ci = glue("{estimate} ({conf.low}, {conf.high})"), predictor = predictor %>% str_remove("_scaled") %>% str_replace_all("_", " ")) %>% # Keep then spread variables for final table select(___, ___, ___) %>% spread(___, ___) %>% # Create a Markdown table kable(caption = "Estimates and 95% CI from all models.")
"`spread` takes two arguments: 1) the current discrete column (`model`) that will be the new columns names, and 2) the values (`estimate_ci`) that will be in the new columns."
# Prepare the results for the table all_models %>% mutate_at(vars(estimate, conf.low, conf.high), round, digits = 2) %>% mutate(estimate_ci = glue("{estimate} ({conf.low}, {conf.high})"), predictor = predictor %>% str_remove("_scaled") %>% str_replace_all("_", " ")) %>% # Keep then spread variables for final table select(model, predictor, estimate_ci) %>% spread(model, estimate_ci) %>% # Create a Markdown table kable(caption = "Estimates and 95% CI from all models.")
"Amazing! You've wrangled the data and prepared it to be presented as a table! You can now easily add this to your manuscript (especially easy if you use R Markdown)."
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.