Now that we've conducted our meta-analyses, it's time to write the paper. This vignette covers three topics that help with that: functions for summary statistics, a function that help you turn your database into a bibliography, and some ggplot2-based visualizations that we've found useful.
Once again, we'll illustrate with data from two previous meta-analyses: sv_data
from "Preventing Sexual Violence —A Behavioral Problem Without a Behaviorally-Informed Solution" and contact_data
from "The Contact Hypothesis Re-evaluated."
sum_lm
sum_lm
takes the built in R functions summary(lm())
and lets you use them in a sequence of pipes. In particular this is useful when you want to test for the magnitude of a relationship between two variables in many different subsets of data at once.
library(PaluckMetaSOP) sv_data |> sum_lm() # now in different subsets library(purrr) sv_data |> split(~study_design) |> map(sum_lm)
sum_tab
sum_tab
, like sum_lm
, takes a built in function (table
) and turns it into something that can be integrated into a sequence of pipes. Again, this is useful for when you want to see many tables at once, or put a few tables into a bigger table.
library(dplyr) # let's say you want to see the table of `behavior_type` in `sv_data # you can do table(sv_data$behavior_type) # the sum_tab version sv_data |> sum_tab(behavior_type) # where this really becomes useful is something like: PaluckMetaSOP::sv_data |> split(~study_design) |> map(~ sum_tab(., behavior_type)) |> bind_rows(.id = "study_design") ## TODO: fix this so NAs come out as zeroes
study_count
This simple function counts how many studies are in a dataset or in different subsets of your data. It assumes your dataset has a variable called unique_study_id
.
sv_data |> study_count() sv_data |> split(~study_design) |> map(study_count) |> bind_rows(.id = "study_design")
This function isn't quite "ready to go" enough to make it into the package as an independent function, but the basic idea is to take a list of DOIs and turn it into a bibliography. It is a wrapper around RefManageR::WriteBib
that enables you to turn a list of DOI into a .bib file with an bibTeX entry for each DOI.
To make use of this, you would include a column of DOIs or URLs for every study in your database. Neither of this package's built-in datasets has this, but let's use a different dataset to demonstrate.
We'll also set eval
to F
so it doesn't continually re-run when the vignette builds (running this requires an API call).
dat_with_dois <- structure(list(author = c("Abrahamse", "Alblas", "Allen", "Berndsen", "Berndsen", "Berndsen", "Bianchi", "Coker", "Feltz", "Feltz", "Feltz", "Feltz", "Jalil", "Lacroix", "Piester", "Piester", "Piester", "Sparkman", "Sparkman", "Sparkman", "Sparkman", "Sparkman", "Sparkman", "Sparkman"), year = c(2007L, 2023L, 2002L, 2005L, 2005L, 2005L, 2022L, 2022L, 2022L, 2022L, 2022L, 2022L, 2023L, 2020L, 2020L, 2020L, 2020L, 2017L, 2017L, 2021L, 2021L, 2021L, 2021L, 2021L ), doi = c("10.1016/j.jenvp.2007.08.002", "10.1080/17524032.2022.2149587", "10.1006/appe.2001.0474", "10.1016/j.appet.2004.10.003", "10.1016/j.appet.2004.10.003", "10.1016/j.appet.2004.10.003", "10.1093/ajcn/nqab414", "10.1016/j.appet.2021.105824", "10.1016/j.appet.2022.105981", "10.1016/j.appet.2022.105981", "10.1016/j.appet.2022.105981", "10.1016/j.appet.2022.105981", "10.1038/s43016-023-00712-1", "10.1016/j.foodqual.2020.103997", "10.1016/j.appet.2020.104842", "10.1016/j.appet.2020.104842", "10.1016/j.appet.2020.104842", "10.1177/0956797617719950", "10.1177/0956797617719950", "10.1016/j.jenvp.2021.101592", "10.1016/j.jenvp.2021.101592", "10.1016/j.jenvp.2021.101592", "10.1016/j.jenvp.2021.101592", "10.1016/j.jenvp.2021.101592")), class = "data.frame", row.names = c(NA, -24L)) manage_references <- function(dat_with_dois, bib_file = './refs.bib') { # Write bibs into bib file for (entry in dat_with_dois$doi) { RefManageR::WriteBib(bib = RefManageR::GetBibEntryWithDOI(doi = entry), file = bib_file, append = TRUE) } # Function to add DOI to the BibTeX file add_doi_to_bib <- function(new_doi) { RefManageR::WriteBib(bib = RefManageR::GetBibEntryWithDOI(new_doi), file = bib_file, append = TRUE) } # Return the function to add DOIs to the BibTeX file return(list(add_doi_to_bib = add_doi_to_bib)) } manage_references(dat_with_dois = dat_with_dois)
Forest plots are a staple of meta-analyses. Many R packages have functions to create them, e.g. metafor::forest
. However, our meta-analytic figures typically add color coding to denote additional information about (e.g.) study design, theoretical category, or outcome type. There was no easy way to do that with existing functions, so here's one approach to doing that in ggplot2 (and some supplementary plotting libraries as well).
library(ggplot2) library(ggtext) model <- contact_data |> map_robust() bold_labels <- function(x) { ifelse(x == "RE Estimate", "<b>RE Estimate</b>", x) } plot_dat <- as.data.frame(contact_data) |> mutate(lower_bound = d - (1.96 * se_d), upper_bound = d + (1.96 * se_d)) |> select(name_short, d, se_d, lower_bound, upper_bound, target_spelled_out) |> add_row(name_short = "RE Estimate", d = model$Delta, lower_bound = model$Delta - (1.96 * model$se), upper_bound = model$Delta + (1.96 * model$se), target_spelled_out = NA) # Get unique study names excluding "RE Estimate" unique_studies <- unique(plot_dat$name_short[plot_dat$name_short != "RE Estimate"]) # Append "RE Estimate" to the end of the list ordered_levels <- c(unique_studies, "RE Estimate") # Set this order to name_short plot_dat$name_short <- reorder(factor(plot_dat$name_short, levels = ordered_levels), desc(plot_dat$se_d)) plot_dat |> ggplot(aes(x = d, y = name_short)) + geom_point(data = subset(plot_dat, name_short == "RE Estimate"), shape = 18) + # shape = 5 for a transparent diamond geom_point(data = subset(plot_dat, name_short != "RE Estimate"), aes(color = target_spelled_out), shape = 18) + geom_errorbarh(data = subset(plot_dat, name_short != "RE Estimate"), aes(xmin = lower_bound, xmax = upper_bound, color = target_spelled_out), height = .1) + geom_vline(xintercept = 0, color = "black", alpha = .5) + geom_vline(xintercept = model$Delta, color = 'black', lty = 'dashed') + theme_minimal() + theme(axis.text.y = element_markdown()) + # Apply HTML formatting to y-axis text scale_y_discrete(labels = bold_labels) + # Use custom function for y-axis labels scale_x_continuous(name = expression(paste("Glass's", " ", Delta))) + labs(color = "Target of Prejudice") + ylab("Study") + ggtitle("Contact hypothesis forest plot") + theme(plot.title = element_text(hjust = 0.5, face = "bold"), axis.line = element_line(colour = "black"))
We didn't actually use this code in "The Contact Hypothesis Re-evaluated", but we probably would have if we were writing that paper now.
A standard check for publication bias is to see if effect size is positively correlated with the size of the standard error. The idea is that in the presence of publication bias, smaller studies are more likely to get shelved if they produce null results, whereas large, well-powered studies are more likely to get published no matter what. In that case, you'd expect for small studies to be an inaccurate estimate of the true population effect size, and for larger studies to produce systematically smaller estimates.
Here's a slightly simplified version of figure 1 from "The Contact Hypothesis Re-Evaluated:"
library(ggrepel) shape_orders <- c(19, 17, 15, 18) target_labels <- c("Age", "Disability", "Foreigners", "Gender", "LGBT", "Race", "Religion" ) ggplot(contact_data, aes(x = se_d, y = d, label = name_short)) + geom_hline(yintercept = 0, lty = "dashed", colour = "grey") + geom_point(aes(shape = factor(pop), color = target_spelled_out), size = 3, alpha = NA) + scale_shape_manual(values = shape_orders, name = "Population", labels = c("Children grades 4-8","High school students", "College-aged subjects", "Adults over 25")) + stat_smooth(aes(fill = NULL), lty = "dashed", fullrange = TRUE, method = "lm", show.legend = FALSE, alpha = .1) + geom_label_repel(size = 3) + xlab("Standard Errors") + ylab("Effect Sizes") + ggtitle("Contact hypothesis publication bias plot") + labs(color = "Target of prejudice") + theme_bw()
This code produces figures that display descriptive characteristics of sv_data
for "Preventing Sexual Violence. As you can see, this required a lot of custom code to account for particular things about how we put the dataset together, so it won't perfectly generalize to a new project; but we think the overall ideas are worth preserving and communicating. (The following was all written by John-Henry Pezzuto.)
library(forcats) library(ggplot2) library(ggtext) library(ggthemes) library(patchwork) library(stringr) library(tidyr) country_fig = sv_data |> distinct(unique_study_id, country) |> mutate(country = ifelse(country == "USA", "USA", "Rest of the world")) |> count(country, sort = T) |> mutate(country = fct_rev(fct_reorder(factor(country), n)), perc = n / sum(n)) |> ggplot(aes(country, n, label = n, fill = country)) + geom_col() + theme_bw() + theme(axis.text = element_markdown(size = 12), axis.title = element_text(size = 14), legend.text = element_text(size = 14), plot.title = element_markdown()) + scale_y_continuous(breaks = seq(0, 300, by = 100), limits = c(0, 300)) + scale_fill_manual(values = solarized_pal()(2)) + guides(fill = "none") + labs(title = "**a)** Number of Studies by Country", x = NULL, y = "# of studies", col = NULL) setting_fig = sv_data |> distinct(unique_study_id, setting, setting2) |> pivot_longer(cols = c("setting", "setting2"), values_to = "setting") |> drop_na(setting) |> count(setting, sort = T) |> mutate(setting = fct_rev(fct_reorder(setting, n)), perc = n / sum(n)) |> ggplot(aes(setting, n, fill = setting)) + geom_col() + scale_fill_manual(values = ggthemes::tableau_color_pal('Tableau 10')(6)) + labs(title = "**b)** Number of Studies by Setting", x = NULL, y = "# of studies", col = NULL) + theme_bw() + guides(fill = "none") + theme(axis.text = element_markdown(size = 12), axis.title = element_text(size = 14), legend.text = element_text(size = 14), plot.title = element_markdown()) dat_participant_sex <- sv_data |> # participant sex lines mutate(participant_sex = if_else(participant_sex %in% c("male and female in mixed groups", "male and female in separate groups and also in combined groups", "male and female in separate groups", "male and female with group composition not specified", "general population"), "mixed gender groups", participant_sex)) |> distinct(year, unique_study_id, participant_sex) |> filter(participant_sex %in% c("mixed gender groups", "only male", "only female")) |> count(year, participant_sex) |> add_row(year = 1985, participant_sex = "mixed gender groups") |> complete(year, participant_sex, fill = list(n = 0)) |> mutate(participant_sex = str_to_title(participant_sex)) |> mutate(participant_sex = factor(participant_sex, levels = c("Mixed Gender Groups", "Only Male", "Only Female"))) tt_gender_fig = dat_participant_sex |> ggplot(aes(year, n, col = participant_sex)) + geom_line(linewidth = 2) + scale_x_continuous( limits = c(1985, 2018), breaks = c(1985, seq(1985, 2020, by = 5), 2018) ) + scale_color_manual(values = ggthemes::palette_pander(3)) + theme_bw() + theme( axis.text = element_markdown(size = 12), axis.title = element_text(size = 14), legend.text = element_text(size = 14), plot.title = element_markdown(), legend.position = "bottom" ) + labs( title = "**c)** Number of Studies Over Time by Group Gender (1985-2018)", x = NULL, y = "# of studies", col = NULL ) # The patchwork library enables this arrangement: country_fig / setting_fig / tt_gender_fig
For "Preventing Sexual Violence," we wanted to check whether changes in ideas were linked to changes in behavior. We first created a subset of our dataset with just studies that measured both categories, and then made a scatterplot that plotted changes in one dimension against changes in the other. The black line represents what a 1-to-1 correlation between the two outcomes would have looked like, and the gray line is what we actually observed.
sv_data |> filter(has_both == 'both') |> group_by(author, year, study_design, unique_study_id, scale_type) |> filter(delay == min(delay)) |> summarise(mean_d = mean(d), mean_var_d = mean(var_d), mean_se_d = mean(se_d)) |> pivot_wider(id_cols = c(author, year, unique_study_id, study_design), names_from = c(scale_type), values_from = c(mean_d, mean_var_d, mean_se_d)) |> ggplot(aes(x = mean_d_ideas, y = mean_d_behavior)) + geom_point(aes(color = study_design), size = 3) + geom_abline(slope = 1, lty = 'dashed') + geom_smooth(lty = "dashed", method = "lm", se = FALSE, color = 'grey') + labs(title = "Correlation between ideas and behavioral change", x = "Effect size (ideas)", y = "Effect size (behaviors)", color = "Study Design") + guides(color = guide_legend(title = "")) + theme_bw() + theme(plot.title = element_text(hjust = 0.5), axis.text = element_text(size = 14), axis.title = element_text(size = 14), legend.text = element_text(size = 14), legend.title = element_text(size = 14), legend.position = "bottom", legend.box = "vertical")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.