vcalc | R Documentation |
Function to construct or approximate the variance-covariance matrix of dependent effect sizes or outcomes, or more precisely, of their sampling errors (i.e., the V
matrix in rma.mv
). \loadmathjax
vcalc(vi, cluster, subgroup, obs, type, time1, time2, grp1, grp2, w1, w2,
data, rho, phi, rvars, checkpd=TRUE, nearpd=FALSE, sparse=FALSE, ...)
vi |
numeric vector to specify the sampling variances of the observed effect sizes or outcomes. |
cluster |
vector to specify the clustering variable (e.g., study). |
subgroup |
optional vector to specify different (independent) subgroups within clusters. |
obs |
optional vector to distinguish different observed effect sizes or outcomes corresponding to the same construct or response/dependent variable. |
type |
optional vector to distinguish different types of constructs or response/dependent variables underlying the observed effect sizes or outcomes. |
time1 |
optional numeric vector to specify the time points when the observed effect sizes or outcomes were obtained (in the first condition if the observed effect sizes or outcomes represent contrasts between two conditions). |
time2 |
optional numeric vector to specify the time points when the observed effect sizes or outcomes were obtained in the second condition (only relevant when the observed effect sizes or outcomes represent contrasts between two conditions). |
grp1 |
optional vector to specify the group of the first condition when the observed effect sizes or outcomes represent contrasts between two conditions. |
grp2 |
optional vector to specify the group of the second condition when the observed effect sizes or outcomes represent contrasts between two conditions. |
w1 |
optional numeric vector to specify the size of the group (or more generally, the inverse-sampling variance weight) of the first condition when the observed effect sizes or outcomes represent contrasts between two conditions. |
w2 |
optional numeric vector to specify the size of the group (or more generally, the inverse-sampling variance weight) of the second condition when the observed effect sizes or outcomes represent contrasts between two conditions. |
data |
optional data frame containing the variables given to the arguments above. |
rho |
argument to specify the correlation(s) of observed effect sizes or outcomes measured concurrently. See ‘Details’. |
phi |
argument to specify the autocorrelation of observed effect sizes or outcomes measured at different time points. See ‘Details’. |
rvars |
optional argument for specifying the variables that correspond to the correlation matrices of the studies (if this is specified, all arguments above except for |
checkpd |
logical to specify whether to check that the variance-covariance matrices within clusters are positive definite (the default is |
nearpd |
logical to specify whether the |
sparse |
logical to specify whether the variance-covariance matrix should be returned as a sparse matrix (the default is |
... |
other arguments. |
Standard meta-analytic models (such as those that can be fitted with the rma.uni
function) assume that the observed effect sizes or outcomes (or more precisely, their sampling errors) are independent. This assumption is typically violated whenever multiple observed effect sizes or outcomes are computed based on the same sample of subjects (or whatever the study units are) or if there is at least partial overlap of subjects that contribute information to the computation of multiple effect sizes or outcomes.
The present function can be used to construct or approximate the variance-covariance matrix of the sampling errors of dependent effect sizes or outcomes for a wide variety of circumstances (this variance-covariance matrix is the so-called V
matrix that may be needed as input for multilevel/multivariate meta-analytic models as can be fitted with the rma.mv
function; see also here for some recommendations on a general workflow for meta-analyses involving complex dependency structures).
Argument cluster
is used to specify the clustering variable. Rows with the same value of this variable are allowed to be dependent, while rows with different values are assumed to be independent. Typically, cluster
will be a study identifier.
Within the same cluster, there may be different subgroups with no overlap of subjects across subgroups. Argument subgroup
can be used to distinguish such subgroups. Rows with the same value of this variable within a cluster are allowed to be dependent, while rows with different values are assumed to be independent even if they come from the same cluster. Therefore, from hereon, ‘cluster’ really refers to the combination of cluster
and subgroup
.
Multiple effect sizes or outcomes belonging to the same cluster may be dependent due to a variety of reasons:
The same construct of interest (e.g., severity of depression) may have been measured using different scales or instruments within a study (e.g., using the Beck Depression Inventory (BDI) and the Hamilton Depression Rating Scale (HDRS)) based on which multiple effect sizes can be computed for the same group of subjects (e.g., contrasting a treatment versus a control group with respect to each scale). In this case, we have multiple effect sizes that are different ‘observations’ of the effect with respect to the same type of construct.
Argument obs
is then used to distinguish different effect sizes corresponding to the same construct. If obs
is specified, then argument rho
must also be used to specify the degree of correlation among the sampling errors of the different effect sizes. Since this correlation is typically not known, the correlation among the various scales (or a rough ‘guestimate’ thereof) can be used as a proxy (i.e., the (typical) correlation between BDI and HDRS measurements).
One can also pass an entire correlation matrix via rho
to specify, for each possible pair of obs
values, the corresponding correlation. The row/column names of the matrix must then correspond to the unique values of the obs
variable.
Multiple types of constructs (or more generally, types of response/dependent variables) may have been measured in the same group of subjects (e.g., severity of depression as measured with the Beck Depression Inventory (BDI) and severity of anxiety as measured with the State-Trait Anxiety Inventory (STAI)). If this is of interest for a meta-analysis, effect sizes can then be computed with respect to each ‘type’ of construct.
Argument type
is then used to distinguish effect sizes corresponding to these different types of constructs. If type
is specified, then argument rho
must also be used to specify the degree of correlation among the sampling errors of effect sizes belonging to these different types. As above, the correlation among the various scales is typically used here as a proxy (i.e., the (typical) correlation between BDI and STAI measurements).
One can also pass an entire correlation matrix via rho
to specify, for each possible pair of type
values, the corresponding correlation. The row/column names of the matrix must then correspond to the unique values of the type
variable.
If there are multiple types of constructs, multiple scales or instruments may also have been used (in at least some of the studies) to measure the same construct and hence there may again be multiple effect sizes that are ‘observations’ of the same type of construct. Arguments type
and obs
should then be used together to specify the various construct types and observations thereof. In this case, argument rho
must be a vector of two values, the first to specify the within-construct correlation and the second to specify the between-construct correlation.
One can also specify a list with two elements for rho
, the first element being either a scalar or an entire correlation matrix for the within-construct correlation(s) and the second element being a scalar or an entire correlation matrix for the between-construct correlation(s). As above, any matrices specified must have row/column names corresponding to the unique values of the obs
and/or type
variables.
The same construct and scale may have been assessed/used multiple times, allowing the computation of multiple effect sizes for the same group of subjects at different time points (e.g., right after the end of a treatment, at a short-term follow-up, and at a long-term follow-up). Argument time1
is then used to specify the time points when the observed effect sizes were obtained. Argument phi
must then also be used to specify the autocorrelation among the sampling errors of two effect sizes that differ by one unit on the time1
variable. As above, the autocorrelation of the measurements themselves can be used here as a proxy.
If multiple constructs and/or multiple scales have also been assessed at the various time points, then arguments type
and/or obs
(together with argument rho
) should be used as needed to differentiate effect sizes corresponding to the different constructs and/or scales.
Many effect sizes or outcome measures (e.g., raw or standardized mean differences, log-transformed ratios of means, log risk/odds ratios and risk differences) reflect the difference between two conditions (i.e., a contrast). Within a study, there may be more than two conditions, allowing the computation of multiple such contrasts (e.g., treatment A versus a control condition and treatment B versus the same control condition) and hence corresponding effect sizes. The reuse of information from the ‘shared’ condition (in this example, the control condition) then induces correlation among the effect sizes.
To account for this, arguments grp1
and grp2
should be used to specify (within each cluster) which two groups were compared in the computation of each effect size (e.g., in the example above, the coding could be grp1=c(2,3)
and grp2=c(1,1)
; whether numbers or strings are used as identifiers is irrelevant).
The degree of correlation between two contrast-type effect sizes that is induced by the use of a shared condition is a function of the size of the groups involved in the computation of the two effect sizes (or, more generally, the inverse-sampling variance weights of the condition-specific outcomes). By default, the group sizes (weights) are assumed to be identical across conditions, which implies a correlation of 0.5. If the group sizes (weights) are known, they can be specified via arguments w1
and w2
(in which case this information is used by the function to calculate a more accurate estimate of the correlation induced by the shared condition).
Moreover, a contrast-type effect size can be based on a between- or a within-subjects design. When at least one or more of the contrast-type effect sizes are based on a within-subjects design, then time1
and time2
should be used in combination with grp1
and grp2
to specify for each effect size the group(s) and time point(s) involved.
For example, grp1=c(2,3)
and grp2=c(1,1)
as above in combination with time1=c(1,1)
and time2=c(1,1)
would imply a between-subjects design involving three groups where two effect sizes were computed contrasting groups 2 and 3 versus group 1 at a single time point. On the other hand, grp1=c(1,1)
and grp2=c(1,1)
in combination with time1=c(2,3)
and time2=c(1,1)
would imply a within-subjects design where two effect sizes were computed contrasting time points 2 and 3 versus time point 1 in a single group. Argument phi
is then used as above to specify the autocorrelation of the measurements within groups (i.e., for the within-subjects design above, it would be the autocorrelation between time points 2 and 1 or equivalently, between time points 3 and 2).
All of the arguments above can be specified together to account for a fairly wide variety of dependency types.
rvars
ArgumentThe function also provides an alternative approach for constructing the variance-covariance matrix using the rvars
argument. Here, one must specify the names of the variables in the dataset that correspond to the correlation matrices of the studies. The variables should be specified as a vector (e.g., c(var1, var2, var3)
) and do not need to be quoted.
In particular, let \mjseqnk_i denote the number of rows corresponding to the \mjeqni\textthith cluster. Then the values of the first \mjseqnk_i variables from rvars
are used to construct the correlation matrix and, together with the sampling variances (specified via vi
), the variance-covariance matrix. Say there are three studies, the first with two correlated estimates, the second with a single estimate, and the third with four correlated estimates. Then the data structure should look like this:
study yi vi r1 r2 r3 r4 ============================= 1 . . 1 NA NA NA 1 . . .6 1 NA NA ----------------------------- 2 . . 1 NA NA NA ----------------------------- 3 . . 1 NA NA NA 3 . . .8 1 NA NA 3 . . .5 .5 1 NA 3 . . .5 .5 .8 1 =============================
with rvars = c(r1, r2, r3, r4)
. If the rvars
variables are a consecutive set in the data frame (as above), then one can use the shorthand notation rvars = c(r1:r4)
, so r1
denotes the first and r4
the last variable in the set. Note that only the lower triangular part of the submatrices defined by the rvars
variables is used. Also, it is important that the rows in the dataset corresponding to a particular study are in consecutive order as shown above.
There must be as many variables specified via rvars
as the number of rows in the largest cluster (in smaller clusters, the non-relevant variables can be set to NA
; see above).
A \mjeqnk \times kkxk variance-covariance matrix (given as a sparse matrix when sparse=TRUE
), where \mjseqnk denotes the length of the vi
variable (i.e., the number of rows in the dataset).
Depending on the data structure, the specified variables, and the specified values for rho
and/or phi
, it is possible that the constructed variance-covariance matrix is not positive definite within one or more clusters (this is checked when checkpd=TRUE
, which is the default). If such non-positive definite submatrices are found, the reasons for this should be carefully checked since this might indicate misapplication of the function and/or the specification of implausible values for rho
and/or phi
.
When setting nearpd=TRUE
, the nearPD
function from the Matrix package is used on variance-covariance submatrices that are not positive definite. This should only be used cautiously and after understanding why these matrices are not positive definite.
Wolfgang Viechtbauer (wvb@metafor-project.org, https://www.metafor-project.org) with some tweaks to speed up the computations by James Pustejovsky (pustejovsky@wisc.edu, https://jepusto.com
).
Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36(3), 1–48. https://doi.org/10.18637/jss.v036.i03
escalc
for a function to compute the observed effect sizes or outcomes (and corresponding sampling variances) for which a variance-covariance matrix could be constructed.
rcalc
for a function to construct the variance-covariance matrix of dependent correlation coefficients.
rma.mv
for a model fitting function that can be used to meta-analyze dependent effect sizes or outcomes.
############################################################################
### see help(dat.assink2016) for further details on this dataset
dat <- dat.assink2016
head(dat, 9)
### assume that the effect sizes within studies are correlated with rho=0.6
V <- vcalc(vi, cluster=study, obs=esid, data=dat, rho=0.6)
### show part of V matrix for studies 1 and 2
round(V[dat$study %in% c(1,2), dat$study %in% c(1,2)], 4)
### or show as list of matrices
blsplit(V, dat$study, round, 4)[1:2]
### use a correlation of 0.7 for effect sizes corresponding to the same type of
### delinquent behavior and a correlation of 0.5 for effect sizes corresponding
### to different types of delinquent behavior
V <- vcalc(vi, cluster=study, type=deltype, obs=esid, data=dat, rho=c(0.7, 0.5))
blsplit(V, dat$study, round, 3)[16]
### examine the correlation matrix for study 16
blsplit(V, dat$study, cov2cor)[16]
############################################################################
### see help(dat.ishak2007) for further details on this dataset
dat <- dat.ishak2007
head(dat, 5)
### create long format dataset
dat <- reshape(dat, direction="long", idvar="study", v.names=c("yi","vi"),
varying=list(c(2,4,6,8), c(3,5,7,9)))
dat <- dat[order(study, time),]
### remove missing measurement occasions from dat
dat <- dat[!is.na(yi),]
rownames(dat) <- NULL
### show the data for the first 5 studies
head(dat, 8)
### construct the full (block diagonal) V matrix with an AR(1) structure
### assuming an autocorrelation of 0.97 as estimated by Ishak et al. (2007)
V <- vcalc(vi, cluster=study, time1=time, phi=0.97, data=dat)
V[1:8, 1:8]
cov2cor(V[1:8, 1:8])
### or show as a list of matrices
blsplit(V, dat$study)[1:5]
blsplit(V, dat$study, cov2cor)[1:5]
############################################################################
### see help(dat.kalaian1996) for further details on this dataset
dat <- dat.kalaian1996
head(dat, 12)
### construct the variance-covariance matrix assuming rho = 0.66 for effect sizes
### corresponding to the 'verbal' and 'math' outcome types
V <- vcalc(vi, cluster=study, type=outcome, data=dat, rho=0.66)
round(V[1:12,1:12], 4)
############################################################################
### see help(dat.berkey1998) for further details on this dataset
dat <- dat.berkey1998
### variables v1i and v2i correspond to the 2x2 var-cov matrices of the studies;
### so use these variables to construct the V matrix (note: since v1i and v2i are
### var-cov matrices and not correlation matrices, set vi=1 for all rows)
V <- vcalc(vi=1, cluster=author, rvars=c(v1i, v2i), data=dat)
V
round(cov2cor(V), 2)
### or show as a list of matrices
blsplit(V, dat$author, function(x) round(cov2cor(x), 2))
### construct the variance-covariance matrix assuming rho = 0.4 for effect sizes
### corresponding to the 'PD' and 'AL' outcome types
V <- vcalc(vi=vi, cluster=trial, type=outcome, data=dat, rho=0.4)
round(V,4)
cov2cor(V)
############################################################################
### see help(dat.knapp2017) for further details on this dataset
dat <- dat.knapp2017
dat[-c(1:2)]
### create variable that indicates the task and difficulty combination as increasing integers
dat$task.diff <- unlist(lapply(split(dat, dat$study), function(x) {
task.int <- as.integer(factor(x$task))
diff.int <- as.integer(factor(x$difficulty))
diff.int[is.na(diff.int)] <- 1
paste0(task.int, ".", diff.int)}))
### construct correlation matrix for two tasks with four different difficulties where the
### correlation is 0.4 for different difficulties of the same task, 0.7 for the same
### difficulty of different tasks, and 0.28 for different difficulties of different tasks
R <- matrix(0.4, nrow=8, ncol=8)
R[5:8,1:4] <- R[1:4,5:8] <- 0.28
diag(R[1:4,5:8]) <- 0.7
diag(R[5:8,1:4]) <- 0.7
diag(R) <- 1
rownames(R) <- colnames(R) <- paste0(rep(1:2, each=4), ".", 1:4)
R
### construct an approximate V matrix accounting for the use of shared groups and
### for correlations among tasks/difficulties as specified in the R matrix above
V <- vcalc(vi, cluster=study, grp1=group1, grp2=group2, w1=n_sz, w2=n_hc,
obs=task.diff, rho=R, data=dat)
Vs <- blsplit(V, dat$study)
cov2cor(Vs[[3]]) # study with multiple SZ groups and a single HC group
cov2cor(Vs[[6]]) # study with two task types and multiple difficulties
cov2cor(Vs[[12]]) # study with multiple difficulties for the same task
cov2cor(Vs[[24]]) # study with separate rows for males and females
cov2cor(Vs[[29]]) # study with separate rows for three genotypes
############################################################################
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.