Intended background:

I assume you know what Cohen's D is, how it's calculated, the basics of linear regression and t-tests as well as controlling for variables.

Where we left off

In the previous vignette, I explained:

If every paper you look at has one treatment arm and one control group, and the mean response rates and variance for each dependent variable are clearly expressed, then you apply the simple equation \begin{equation} d = \frac{M_{2} - M_{1}}{SD} \end{equation} to every outcome of interest to get a key compotnent of your meta-analytic datasest.

Regressions and t-tests

Many papers do not present their results so straightforwardly. In some disciplines, it's common to present regression results or t-tests (or both) in lieu of means and standard deviations.

Let's work through an example. First, we'll construct an imaginary dataset where the true parameters are all known.

rm(list=ls())
set.seed(11111988) # for reproducibility

N <- 1000 # number of subjects
treatment_vector <- rbinom(N, 1, .5) # vector of 1s and 0s for treatment assignment
Y0 <- rep(NA, N)
Y0[treatment_vector==0] <-rnorm(n=sum(treatment_vector==0), mean=3.5, sd=0.5)
Y0[treatment_vector==1] <-rnorm(n=sum(treatment_vector==1), mean=4.0, sd=0.5)
mean(Y0)  
mean(Y0[treatment_vector==1]) - mean(Y0[treatment_vector == 0]) # true ATE:

# Cohen's D:
(mean(Y0[treatment_vector==1]) - mean(Y0[treatment_vector == 0])) / sd(Y0) # true Cohens D

Bear in mind that we now know the true treatment effect -- 0.475 -- and the true estimate of Cohen's D (0.862).

Let's say the author just presents the output of the following regression as their results:

summary(lm(formula = Y0 ~ treatment_vector))

Could you recover the true ATE and Cohen's D from this? Yes, but imperfectly.

First, there is a standard conversion tool for converting a t-test to cohen's D: \begin{equation} d = \frac{t * \sqrt{\frac{n_t + n_c}{n_t * n_c}}}{n_t * n_c} \end{equation}

expressed in code, this looks something like

`d <- (t * sqrt((n_t + n_c)/(n_t * n_c)), digits = 3)

I"ve coded this, and a few other equations, into the PrejMeta::ResultsStandardizeR() function:

library(PrejMetaFunctions)
ResultsStandardizeR(eff_type = 't_test', u_s_d = 15.09, n_t = 491, n_c = 509)

0.955 is the estimated of Cohen's D from this equation, whereas the true value is .867. I would call this 'in the right ballpark' but not nearly as good as actually having means and the standard deviation.

How about the F test? In this particular scenario, the F statistic is just the square of the t statistic (15.09^2 = 227.7081), and the equation for converting this number to Cohen's D just removes a square root from the right hand side of the equation.

ResultsStandardizeR(eff_type = 'f_test', u_s_d = 227.70, n_t = 491, n_c = 509)

Same answer.

What if you don't have t-tests or f-tests, but just regresssion coefficients and their standard errors?

Looking at the results above, you'll see that the coefficient associated with treatment_vector is exactly equivalent to the true ATE (mean(Y0[treatment_vector==1]) - mean(Y0[treatment_vector == 0])) -- so to calculate Cohen's D, you need a a way of calculating the SD of the dependent variable. What you have is Standard Errors, which can be converted to Standard Deviations via \begin{equation} SE = SD * \sqrt{N} \end{equation}.

There are length(Y0[treatment_vector==0]) (509) subjects in the control group, and 491 in the treatment; the SE of the control group is 0.02207, and for treatment, 0.03149. The true SD of Y0 is sd(Y0) 0.55146. The SD for the control group is 0.5, and for the treatment group, 0.496. Let's see how close we can get to that via the available information:

# from the treament group:
0.02207 * sqrt(509) # 0.4979219 -- very close to the true value of 0.5

0.03149 * sqrt(491) # 0.6978 -- not so close.

# average the two together?
((0.02207 * sqrt(509)) + (0.03149 * sqrt(491))) /2 # 0.5978468 -- reasonably close.

The true Cohen's D, again, is 0.8618. Dividing the regression coefficient of 0.47529 by the three different estimates of sd we got above, we get:

#control group
0.47529 / (0.02207 * sqrt(509)) # 0.95 

# treatment group
0.47529 / (0.03149 * sqrt(491)) # 0.68 -- not so close

0.47529 / (((0.02207 * sqrt(509)) + (0.03149 * sqrt(491))) /2) # 0.7950029

Something I don't understand and hope to pick up tomorrow is -- why aren't these results precisely right?



setgree/ResultsStandardizeR documentation built on June 2, 2020, 11:48 a.m.