Home

/

GitHub

/

setgree/ResultsStandardizeR

/

In setgree/ResultsStandardizeR: Some functions that Seth Green & John-Henry Pezzuto wrote to assist with a forthcoming meta-analysis

Intended Audience:

No technical background presumed, but if you know what Cohen's D is and why it's used, you can skip this.

Motivating Example:

Let's say you're interested in the auto-immune disease scleroderma, and you want to know which therapies work against it. Your begin by searching academic databases for high-quality studies on the subject, i.e. double-blinded, randomized controlled trials. You find 15 of these, and luckily for you, each presents its main result as a positive effect on Quality-adjusted life years (QALY) -- so it's straightforward to make an apples-to-apples comparison between them.

Scanning the abstracts, you find that 13 of 15 papers claim that their therapies increase QALY by a statistically significant amount; the remaining two find significant results forwomen under 50. This sounds like good news! But because you're wary of publication bias, which is a problem for meta-analyses in particular, you resolve to take a close look at each rather than taking their results at face value.

To help with this, you put your notes about the 15 papers into a spreadsheet, where each row represents one study, and each coumn is a piece of information about that study: average age of the patients, the reported effect on QALY, how many people were in the treatment group, etc.

Once you've read all the studies, you see that one therapy improved QALY by almost twice as much as the next closest -- but that study had just 50 patients in the treatment group, while most other studies had more than 100 treated patients. You also read a troubling footnote in a different paper alluding to difficulties in maintaining a double-blinded procedure (the pills it looked at are very distinctive, so the doctors might know on sight the difference between treatment and placebo).

When you write up your results, you argue that one therapy is the most promising, with the caveat that we'd be more sure about its effects if they were replicated with a larger sample. You also show readers a scatterplot and line graph plotting QALY against sample size, showing that larger effects tend to be found in smaller sample sizes (this is one way that people check for publication bias). Finally, you end with an 'extensions and limitations' section that lays out your concerns about double-blinding, and calls for future researchers to address this forthrightly.

And that's the gist of writing a meta-analysis.

Simplifying assumptions

In this example, each study:

looked at something concrete and reasonably well-understood (scleroderma);
employed the same design (randomized placebo trial);
presented the same outcome (QALY).

If you're looking at a literature where all three conditions hold, you're in luck -- making apples-to-apples comparisons between studies is going to be relatively easy. But none of these has been true for the three meta-analyses I've worked on.[^1] If you want to find what interventions reduce racism, for instance, you might find a lot of scholarly dispute about both what racism is and how to measure it (especially in a cross-country context). You'll also probably find studies with really different designs and aims -- one paper might be a longitidudinal study with 5000 adults but no randomization, and another could be a randomized study of 300 students in 10 classrooms. Synthesizing evidence on a question like this requires judgements calls about, e.g., what papers to include in your search (based on some criteria like target population or study design) and which outcomes to report from them.

When you've done so, the next problem is creating a framework for comparing a lot of different outcomes. Returning to the example of 'what interventions reduce racism', a reviewer is likely to find a mix of IAT tests (which some researchers think are "not good predictors of ethnic or racial discrimination"), explicit attitudes, and behavior (e.g. cooperation in a prisoner's dilemma game). If one study produces an average change of 1 point on a 7 point attitude scale, and another shows the treatment group was 10% more likely to cooperate in a game, how do you average these, or say whether one effect is 'bigger' than another?

Converting to a common statistical framework

Enter Cohen's D, which divides the mean difference between two groups (sometimes called the Average Treatment Effect, or ATE) of an outcome by the standard deviation of that outcome. To make this concrete:[^2]

Imagine a study that randomly assigns white university students to watch three episodes of either 'Little Mosque on the Prairie' or 'Friends'.
30 minutes before they watch the episodes, and again 30 minutes after, everyone is asked a single question: "how warmly do you feel towards Muslims? (1 = 'not very warmly', 7 = 'very warmly')."
they each write down a number between 1 and 7.
Before the subjects watch either show, the treatment group gives an average response of 3, and the control group's average is 2.9.
The standard deviation of responses across all subjects, before treatment, is 2.2.
After they watch their condition's show, the group that watched 'Little Mosque on the Prairie' gives a mean response of 3.3, while the control group's scores decreased slightly (2.8).
You can then calculate Cohen's D in one of two ways:
difference in means / sd == (ATE_treatment_posttest - ATE_control_posttest) / sd == (3.3 - 2.8) / 2.2 == 0.227; or
difference in differences / sd == ((3.3 - 3) - (2.8 - 2.9)) / 2.2 == 0.182.

You then do this for every outcome of each study you look at.[^3] This enables you to aggregate all studies into estimates of, e.g., which interventions work best, the average treatment effect of a literature as a whole, the relationship between magnitude of effect size and the precision of each estimate, and anything else you want to know (and think the data can tell you).

What if a study doesn't report means and the standard deviation?

Then you use the statistical information available to you to try to derive them. But this process is more fraught than I once realized, and one particular challenge of doing so -- estimating Cohen's D when all results are reported after control variables are accounted for -- is the subject of the next essay.

[^1]: The contact hypothesis re-evaluated, published July 2018; A systematic review and meta-analysis of primary prevention strategies for sexual violence perpetration, in the data gathering phase; and another I’m currently working as an RA on.↩ [^2]: This example is a simplified version of work by Sohad Murrar, profiled nicely here. [^3]: You would also calculate the standard error of each Cohen's D estimate, which estimates the precision of your Cohen's D estimate.

setgree/ResultsStandardizeR documentation built on June 2, 2020, 11:48 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

setgree/ResultsStandardizeR
Some functions that Seth Green & John-Henry Pezzuto wrote to assist with a forthcoming meta-analysis

In setgree/ResultsStandardizeR: Some functions that Seth Green & John-Henry Pezzuto wrote to assist with a forthcoming meta-analysis

Intended Audience:

Motivating Example:

Simplifying assumptions

Converting to a common statistical framework

What if a study doesn't report means and the standard deviation?

R Package Documentation

Browse R Packages

We want your feedback!

setgree/ResultsStandardizeR Some functions that Seth Green & John-Henry Pezzuto wrote to assist with a forthcoming meta-analysis

In setgree/ResultsStandardizeR: Some functions that Seth Green & John-Henry Pezzuto wrote to assist with a forthcoming meta-analysis

Intended Audience:

Motivating Example:

Simplifying assumptions

Converting to a common statistical framework

What if a study doesn't report means and the standard deviation?

R Package Documentation

Browse R Packages

We want your feedback!

setgree/ResultsStandardizeR
Some functions that Seth Green & John-Henry Pezzuto wrote to assist with a forthcoming meta-analysis