conv.wald | R Documentation |
Function to convert Wald-type confidence intervals (CIs) and test statistics (or the corresponding p-values) to sampling variances. \loadmathjax
conv.wald(out, ci.lb, ci.ub, zval, pval, n, data, include,
level=95, transf, check=TRUE, var.names, append=TRUE, replace="ifna", ...)
out |
vector with the observed effect sizes or outcomes. |
ci.lb |
vector with the lower bounds of the corresponding Wald-type CIs. |
ci.ub |
vector with the upper bounds of the corresponding Wald-type CIs. |
zval |
vector with the Wald-type test statistics. |
pval |
vector with the p-values of the Wald-type tests. |
n |
vector with the total sample sizes of the studies. |
data |
optional data frame containing the variables given to the arguments above. |
include |
optional (logical or numeric) vector to specify the subset of studies for which the conversion should be carried out. |
level |
numeric value (or vector) to specify the confidence interval level(s) (the default is 95; see here for details). |
transf |
optional argument to specify a function to transform |
check |
logical to specify whether the function should carry out a check to examine if the point estimates fall (approximately) halfway between the CI bounds (the default is |
var.names |
character vector with two elements to specify the name of the variable for the observed effect sizes or outcomes and the name of the variable for the corresponding sampling variances (if |
append |
logical to specify whether the data frame provided via the |
replace |
character string or logical to specify how values in |
... |
other arguments. |
The escalc
function can be used to compute a wide variety of effect sizes or ‘outcome measures’. However, the inputs required to compute certain measures with this function may not be reported for all of the studies. Under certain circumstances, other information (such as point estimates and corresponding confidence intervals and/or test statistics) may be available that can be converted into the appropriate format needed for a meta-analysis. The purpose of the present function is to facilitate this process.
The function typically takes a data frame created with the escalc
function as input via the data
argument. This object should contain variables yi
and vi
(unless argument var.names
was used to adjust these variable names when the "escalc"
object was created) for the observed effect sizes or outcomes and the corresponding sampling variances, respectively. For some studies, the values for these variables may be missing.
In some studies, the effect size estimate or observed outcome may already be reported. If so, such values can be supplied via the out
argument and are then substituted for missing yi
values. At times, it may be necessary to transform the reported values (e.g., reported odds ratios to log odds ratios). Via argument transf
, an appropriate transformation function can be specified (e.g., transf=log
), in which case \mjseqny_i = f(\textrmout) where \mjeqnf(\cdot)f(.) is the function specified via transf
.
Moreover, a confidence interval (CI) may have been reported together with the estimate. The bounds of the CI can be supplied via arguments ci.lb
and ci.ub
, which are also transformed if a function is specified via transf
. Assume that the bounds were obtained from a Wald-type CI of the form \mjseqny_i \pm z_crit \sqrtv_i (on the transformed scale if transf
is specified), where \mjseqnv_i is the sampling variance corresponding to the effect size estimate or observed outcome (so that \mjseqn\sqrtv_i is the corresponding standard error) and \mjeqnz_critz_crit is the appropriate critical value from a standard normal distribution (e.g., \mjseqn1.96 for a 95% CI). Then \mjdeqnv_i = \left(\frac\textrmci.ub - \textrmci.lb2 \times z_crit\right)^2v_i = ((ci.ub - ci.lb) / (2*z_crit))^2 is used to back-calculate the sampling variances of the (transformed) effect size estimates or observed outcomes and these values are then substituted for missing vi
values in the dataset.
For example, consider the following dataset of three RCTs used as input for a meta-analysis of log odds ratios:
dat <- data.frame(study = 1:3, cases.trt = c(23, NA, 4), n.trt = c(194, 183, 46), cases.plc = c(38, NA, 7), n.plc = c(201, 188, 44), oddsratio = c(NA, 0.64, NA), lower = c(NA, 0.33, NA), upper = c(NA, 1.22, NA)) dat <- escalc(measure="OR", ai=cases.trt, n1i=n.trt, ci=cases.plc, n2i=n.plc, data=dat) dat # study cases.trt n.trt cases.plc n.plc oddsratio lower upper yi vi # 1 1 23 194 38 201 NA NA NA -0.5500 0.0818 # 2 2 NA 183 NA 188 0.64 0.33 1.22 NA NA # 3 3 4 46 7 44 NA NA NA -0.6864 0.4437
where variable yi
contains the log odds ratios and vi
the corresponding sampling variances as computed from the counts and group sizes by escalc()
.
Study 2 does not report the counts (or sufficient information to reconstruct them), but the odds ratio and a corresponding 95% confidence interval (CI) directly, as given by variables oddsratio
, lower
, and upper
. The CI is a standard Wald-type CI that was computed on the log scale (and whose bounds were then exponentiated). Then the present function can be used as follows:
dat <- conv.wald(out=oddsratio, ci.lb=lower, ci.ub=upper, data=dat, transf=log) dat # study cases.trt n.trt cases.plc n.plc oddsratio lower upper yi vi # 1 1 23 194 38 201 NA NA NA -0.5500 0.0818 # 2 2 NA 183 NA 188 0.64 0.33 1.22 -0.4463 0.1113 # 3 3 4 46 7 44 NA NA NA -0.6864 0.4437
Now variables yi
and vi
in the dataset are complete.
If the CI was not a 95% CI, then one can specify the appropriate level via the level
argument. This can also be an entire vector in case different studies used different levels.
By default (i.e., when check=TRUE
), the function carries out a rough check to examine if the point estimate falls (approximately) halfway between the CI bounds (on the transformed scale) for each study for which the conversion was carried out. A warning is issued if there are studies where this is not the case. This may indicate that a particular CI was not a Wald-type CI or was computed on a different scale (in which case the back-calculation above would be inappropriate), but can also arise due to rounding of the reported values (in which case the back-calculation would still be appropriate, albeit possibly a bit inaccurate). Care should be taken when using such back-calculated values in a meta-analysis.
Similarly, study authors may report the test statistic and/or p-value from a Wald-type test of the form \mjseqn\textrmzval = y_i / \sqrtv_i (on the transformed scale if transf
is specified), with the corresponding two-sided p-value given by \mjseqn\textrmpval = 2(1 - \Phi(\textrm|zval|)), where \mjeqn\Phi(\cdot)Phi(.) denotes the cumulative distribution function of a standard normal distribution (i.e., pnorm
). Test statistics and/or corresponding p-values of this form can be supplied via arguments zval
and pval
.
A given p-value can be back-transformed into the corresponding test statistic (if it is not already available) with \mjseqn\textrmzval = \Phi^-1(1 - \textrmpval/2), where \mjeqn\Phi^-1(\cdot)Phi^-1(.) denotes the quantile function (i.e., the inverse of the cumulative distribution function) of a standard normal distribution (i.e., qnorm
). Then \mjdeqnv_i = \left(\fracy_i\textrmzval\right)^2v_i = (yi / zval)^2 is used to back-calculate a missing vi
value in the dataset.
Note that the conversion of a p-value to the corresponding test statistic (which is then converted into sampling variance) as shown above assumes that the exact p-value is reported. If authors only report that the p-value fell below a certain threshold (e.g., \mjseqnp < .01 or if authors only state that the test was significant – which typically implies \mjseqnp < .05), then a common approach is to use the value of the cutoff reported (e.g., if \mjseqnp < .01 is reported, then assume \mjseqnp = .01), which is conservative (since the actual p-value was below that assumed value by some unknown amount). The conversion will therefore tend to be much less accurate.
Using the earlier example, suppose that only the odds ratio and the corresponding two-sided p-value from a Wald-type test (whether the log odds ratio differs significantly from zero) is reported for study 2.
dat <- data.frame(study = 1:3, cases.trt = c(23, NA, 4), n.trt = c(194, 183, 46), cases.plc = c(38, NA, 7), n.plc = c(201, 188, 44), oddsratio = c(NA, 0.64, NA), pval = c(NA, 0.17, NA)) dat <- escalc(measure="OR", ai=cases.trt, n1i=n.trt, ci=cases.plc, n2i=n.plc, data=dat) dat study cases.trt n.trt cases.plc n.plc oddsratio pval yi vi 1 1 23 194 38 201 NA NA -0.5500 0.0818 2 2 NA 183 NA 188 0.64 0.17 NA NA 3 3 4 46 7 44 NA NA -0.6864 0.4437
Then the function can be used as follows:
dat <- conv.wald(out=oddsratio, pval=pval, data=dat, transf=log) dat # study cases.trt n.trt cases.plc n.plc oddsratio pval yi vi # 1 1 23 194 38 201 NA NA -0.5500 0.0818 # 2 2 NA 183 NA 188 0.64 0.17 -0.4463 0.1058 # 3 3 4 46 7 44 NA NA -0.6864 0.4437
Note that the back-calculated sampling variance for study 2 is not identical in these two examples, because the CI bounds and p-value are rounded to two decimal places, which introduces some inaccuracies. Also, if both (ci.lb
, ci.ub
) and either zval
or pval
is available for a study, then the back-calculation of \mjseqnv_i via the confidence interval is preferred.
Optionally, one can use the n
argument to supply the total sample sizes of the studies. This has no relevance for the calculations done by the present function, but some other functions may use this information (e.g., when drawing a funnel plot with the funnel
function and one adjusts the yaxis
argument to one of the options that puts the sample sizes or some transformation thereof on the y-axis).
If the data
argument was not specified or append=FALSE
, a data frame of class c("escalc","data.frame")
with two variables called var.names[1]
(by default "yi"
) and var.names[2]
(by default "vi"
) with the (transformed) observed effect sizes or outcomes and the corresponding sampling variances (computed as described above).
If data
was specified and append=TRUE
, then the original data frame is returned. If var.names[1]
is a variable in data
and replace="ifna"
(or replace=FALSE
), then only missing values in this variable are replaced with the (possibly transformed) observed effect sizes or outcomes from out
(where possible) and otherwise a new variable called var.names[1]
is added to the data frame. Similarly, if var.names[2]
is a variable in data
and replace="ifna"
(or replace=FALSE
), then only missing values in this variable are replaced with the sampling variances back-calculated as described above (where possible) and otherwise a new variable called var.names[2]
is added to the data frame.
If replace="all"
(or replace=TRUE
), then all values in var.names[1]
and var.names[2]
are replaced, even for cases where the value in var.names[1]
and var.names[2]
is not missing.
A word of caution: Except for the check on the CI bounds, there is no possibility to determine if the back-calculations done by the function are appropriate in a given context. They are only appropriate when the CI bounds and tests statistics (or p-values) arose from Wald-type CIs / tests as described above. Using the same back-calculations for other purposes is likely to yield nonsensical values.
Wolfgang Viechtbauer wvb@metafor-project.org https://www.metafor-project.org
Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36(3), 1–48. https://doi.org/10.18637/jss.v036.i03
escalc
for a function to compute various effect size measures.
### a very simple example
dat <- data.frame(or=c(1.37,1.89), or.lb=c(1.03,1.60), or.ub=c(1.82,2.23))
dat
### convert the odds ratios and CIs into log odds ratios with corresponding sampling variances
dat <- conv.wald(out=or, ci.lb=or.lb, ci.ub=or.ub, data=dat, transf=log)
dat
############################################################################
### a more elaborate example based on the BCG vaccine dataset
dat <- dat.bcg[,c(2:7)]
dat
### with complete data, we can use escalc() in the usual way
dat1 <- escalc(measure="OR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat)
dat1
### random-effects model fitted to these data
res1 <- rma(yi, vi, data=dat1)
res1
### now suppose that the 2x2 table data are not reported in all studies, but that the
### following dataset could be assembled based on information reported in the studies
dat2 <- data.frame(summary(dat1))
dat2[c("yi", "ci.lb", "ci.ub")] <- data.frame(summary(dat1, transf=exp))[c("yi", "ci.lb", "ci.ub")]
names(dat2)[which(names(dat2) == "yi")] <- "or"
dat2[,c("or","ci.lb","ci.ub","pval")] <- round(dat2[,c("or","ci.lb","ci.ub","pval")], digits=2)
dat2$vi <- dat2$sei <- dat2$zi <- NULL
dat2$ntot <- with(dat2, tpos + tneg + cpos + cneg)
dat2[c(1,12),c(3:6,9:10)] <- NA
dat2[c(4,9), c(3:6,8)] <- NA
dat2[c(2:3,5:8,10:11,13),c(7:10)] <- NA
dat2$ntot[!is.na(dat2$tpos)] <- NA
dat2
### in studies 1 and 12, authors reported only the odds ratio and the corresponding p-value
### in studies 4 and 9, authors reported only the odds ratio and the corresponding 95% CI
### use escalc() first
dat2 <- escalc(measure="OR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat2)
dat2
### fill in the missing log odds ratios and sampling variances
dat2 <- conv.wald(out=or, ci.lb=ci.lb, ci.ub=ci.ub, pval=pval, n=ntot, data=dat2, transf=log)
dat2
### random-effects model fitted to these data
res2 <- rma(yi, vi, data=dat2)
res2
### any differences between res1 and res2 are a result of or, ci.lb, ci.ub, and pval being
### rounded in dat2 to two decimal places; without rounding, the results would be identical
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.