knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Now that we have our vectors of ∆, variance, and standard errors for each paper, it's time to perform a meta-analysis, i.e. the statistical procedure by which the results of many studies are pooled.
The main function in this vignette is called map_robust
. We'll demonstrate how to use it with our built-in datasets, primarily sv_data
. This is the dataset for "Preventing Sexual Violence —A Behavioral Problem Without a Behaviorally-Informed Solution". (We'll also use contact_data
from "The Contact Hypothesis Re-evaluated" for a few examples.)
map_robust
is a wrapper around a pre-existing function from the metafor
package called robust
, which is itself a wrapped around the function rma
. See the man
pages for those functions for full details.
map_robust
takes a dataset and returns a data frame with an N of the number of data points; an N for the number of studies; a ∆ signifying the pooled effect size; a standard error; and a p value.
library(PaluckMetaSOP) sv_data |> map_robust()
This is equivalent to running the following:
library(metafor) robust(x = metafor::rma(yi = sv_data$d, vi = sv_data$var_d), cluster = sv_data$unique_study_id)
Our function is different in a few respects. First, it's designed to take in a dataset via a piped operation (|>
), which was actually the main reason we wrote it in the first place (we were in the habit of using a lot of pipes). Second, so long as your dataset contains variables called d
, var_d
, and unique_study_id
, it's a lot easier to type. Third, it returns a truncated output that corresponds to the information we were actually recording in our results sections.
Our meta-analyses cluster at the level of study. You could also cluster at the level of a paper, or of a team of authors, or at no level at all. Also, if you run our function on a dataset where each unique_study_id corresponds to only one row in the dataset, it's equivalent to not clustering. We'll demonstrate that with PaluckMetaSOP::contact_data
:
library(dplyr) contact_data |> select(-unique_study_id) |> # remove unique study id to reconstruct it group_by(name_short) |> mutate(unique_study_id = cur_group_id()) |> map_robust() #' note: this Delta is a little different than the one we report in TCHR #' we used Stata at the time to calculate the pooled average effect size #' (and probably a few other small differences) # now robust analysis robust(x = metafor::rma(yi = contact_data$d, vi = contact_data$var_d), cluster = contact_data$unique_study_id) # same as if we drop the clustering information rma(yi = d, vi = var_d, data = contact_data)
If you want to cluster at a different level, or call your cluster variable something different and modify the function accordingly, go ahead! Every meta-analysis is different, and we encourage you to to modify these functions to serve your own needs. (We typically have a functions
folder in our replication archives where we keep all our custom functions, which is a nice way to keep your function files organized separately from your scripts.)
The second way we use map_robust
is via subgroup analyses. For instance, in "Preventing Sexual Violence," we wanted to compare the results on ideas-based outcomes to our major categories of behavioral outcomes (perpetration, victimization, bystander behaviors, and involvement behaviors.) To do this,we use the map
function from the purrr
library to split the dataset and apply a function (Hadley Wickham calls this "The Split-Apply-Combine Strategy for Data Analysis").
library(purrr) sv_data |> split(~behavior_type) |> map(map_robust) |> bind_rows()
A note about this: if your "group" (the thing you are running purrr::split()
on) has only one data point, the resulting numbers won't be a pooled average effect, but instead will just be the ∆ and se and p-value of that individual estimate. If this comes up, you should explain that to your readers in a note on the table.
If we apply a few more functions, we can actually get pretty close to a publication-ready table.
dplyr::bind_cols()
combines multiple data frames (in this case, a list of lists) into a single data frame.
sv_data |> split(~behavior_type) |> map(map_robust) |> bind_rows(.id = "behavior_type")
Which you can turn into a variety of formats, e.g LateX or markdown, via knitr::kable()
:
library(knitr) sv_data |> split(~behavior_type) |> map(map_robust) |> bind_rows() |> kable('markdown')
Here is a small function that add stars to text corresponding to p values:
get_significance_stars <- function(pval) { sapply(pval, function(x) { if (is.na(x)) { "" } else if (x < 0.001) { "***" } else if (x < 0.01) { "**" } else if (x < 0.05) { "*" } else { "" } }) }
Here we apply this function directly to the R code, and modify the columns to combine the ∆ and se into one column with the format ∆ (se)
:
sv_data |> split(~behavior_type) |> map(map_robust) |> bind_rows(.id = "behavior_type") |> mutate(delta_se = sprintf("%.3f%s (%.3f)", Delta, get_significance_stars(pval), se)) |> select(behavior_type, N_studies, delta_se)
Finally, we can use the "great table"" library (gt
) library to format everything nicely and add explanatory notes:
library(gt) sv_data |> split(~behavior_type) |> map(map_robust) |> bind_rows(.id = "behavior_type") |> mutate(delta_se = sprintf("%.3f%s (%.3f)", Delta, get_significance_stars(pval), se)) |> select(behavior_type, N_studies, delta_se) |> gt() |> tab_header( title = "∆ by category of dependent variable") |> cols_label( behavior_type = "Behavior type", N_studies = "N (Studies)", delta_se = "∆ (se)" ) |> tab_source_note( source_note = "* < 0.05, ** < 0.01, *** < 0.001.")
gt
has a function called as_latex()
that will convert this all into latex code directly.
(By the way, the above table-making code is a little more sophisticated than anything we did in our published meta-analyses to date, but the ideas for all this were germinating during "Preventing Sexual Violence.")
Writing your meta-analytic paper.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.