matreg: Fit Regression Models based on Correlation and Covariance...

View source: R/matreg.r

matregR Documentation

Fit Regression Models based on Correlation and Covariance Matrices

Description

Function to fit regression models based on correlation and covariance matrices. \loadmathjax

Usage

matreg(y, x, R, n, V, cov=FALSE, means, ztor=FALSE,
       nearpd=FALSE, level=95, digits, ...)

Arguments

y

index (or name given as a character string) of the outcome variable.

x

indices (or names given as a character vector) of the predictor variables.

R

correlation or covariance matrix (or only the lower triangular part including the diagonal).

n

sample size based on which the elements in the correlation/covariance matrix were computed.

V

variance-covariance matrix of the lower triangular elements of the correlation/covariance matrix. Either V or n should be specified, not both. See ‘Details’.

cov

logical to specify whether R is a covariance matrix (the default is FALSE).

means

optional vector to specify the means of the variables (only relevant when cov=TRUE).

ztor

logical to specify whether R is a matrix of r-to-z transformed correlations and hence should be back-transformed to raw correlations (the default is FALSE). See ‘Details’.

nearpd

logical to specify whether the nearPD function from the Matrix package should be used when the \mjeqnR_x,xR[x,x] matrix cannot be inverted. See ‘Note’.

level

numeric value between 0 and 100 to specify the confidence interval level (the default is 95; see here for details).

digits

optional integer to specify the number of decimal places to which the printed results should be rounded.

...

other arguments.

Details

Let \mjseqnR be a \mjeqnp \times ppxp correlation or covariance matrix. Let \mjseqny denote the row/column of the outcome variable and \mjseqnx the row(s)/column(s) of the predictor variable(s) in this matrix. Let \mjeqnR_x,xR[x,x] and \mjeqnR_x,yR[x,y] denote the corresponding submatrices of \mjseqnR. Then \mjdeqnb = R_x,x^-1 R_x,yb = R[x,x]^(-1) R[x,y] yields the standardized or raw regression coefficients (depending on whether \mjseqnR is a correlation or covariance matrix, respectively) when regressing the outcome variable on the predictor variable(s).

The \mjseqnR matrix may be computed based on a single sample of \mjseqnn subjects. In this case, one should specify the sample size via argument n. The variance-covariance matrix of the standardized regression coefficients is then given by \mjeqn\mboxVar[b] = \mboxMSE \times R_x,x^-1Var[b] = MSE * R[x,x]^(-1), where \mjeqn\mboxMSE = (1 - b'R_x,y) / (n - m)MSE = (1 - b'R[x,y]) / (n -m) and \mjseqnm denotes the number of predictor variables. The standard errors are then given by the square root of the diagonal elements of \mjeqn\mboxVar[b]Var[b]. Test statistics (in this case, t-statistics) and the corresponding p-values can then be computed as in a regular regression analysis. When \mjseqnR is a covariance matrix, one should set cov=TRUE and specify the means of the \mjseqnp variables via argument means to obtain raw regression coefficients including the intercept and corresponding standard errors.

Alternatively, \mjseqnR may be the result of a meta-analysis of correlation coefficients. In this case, the elements in \mjseqnR are pooled correlation coefficients and the variance-covariance matrix of these pooled coefficients should be specified via argument V. The order of elements in V should correspond to the order of elements in the lower triangular part of \mjseqnR column-wise. For example, if \mjseqnR is a \mjeqn4 \times 44x4 matrix of the form: \mjtdeqn\left[ \beginarraycccc 1 & & & \\ r_21 & 1 & & \\ r_31 & r_32 & 1 & \\ r_41 & r_42 & r_43 & 1 \endarray \right]\beginbmatrix 1 & & & \\\ r_21 & 1 & & \\\ r_31 & r_32 & 1 & \\\ r_41 & r_42 & r_43 & 1 \endbmatrix then the elements are \mjseqnr_21, \mjseqnr_31, \mjseqnr_41, \mjseqnr_32, \mjseqnr_42, and \mjseqnr_43 and hence V should be a \mjeqn6 \times 66x6 variance-covariance matrix of these elements in this order. The variance-covariance matrix of the standardized regression coefficients (i.e., \mjeqn\mboxVar[b]Var[b]) is then computed as a function of V as described in Becker (1992) using the multivariate delta method. The standard errors are then again given by the square root of the diagonal elements of \mjeqn\mboxVar[b]Var[b]. Test statistics (in this case, z-statistics) and the corresponding p-values can then be computed in the usual manner.

In case \mjseqnR is the result of a meta-analysis of Fisher r-to-z transformed correlation coefficients (and hence V is then the corresponding variance-covariance matrix of these pooled transformed coefficients), one should set argument ztor=TRUE, so that the appropriate back-transformation is then applied to R (and V) within the function.

Finally, \mjseqnR may be a covariance matrix based on a meta-analysis (e.g., the estimated variance-covariance matrix of the random effects in a multivariate model). In this case, one should set cov=TRUE and V should again be the variance-covariance matrix of the elements in \mjseqnR, but now including the diagonal. Hence, if \mjseqnR is a \mjeqn4 \times 44x4 matrix of the form: \mjtdeqn\left[ \beginarraycccc \tau_1^2 & & & \\ \tau_21 & \tau_2^2 & & \\ \tau_31 & \tau_32 & \tau_3^2 & \\ \tau_41 & \tau_42 & \tau_43 & \tau_4^2 \endarray \right]\beginbmatrix \tau_1^2 & & & \\\ \tau_21 & \tau_2^2 & & \\\ \tau_31 & \tau_32 & \tau_3^2 & \\\ \tau_41 & \tau_42 & \tau_43 & \tau_4^2 \endbmatrix then the elements are \mjseqn\tau^2_1, \mjseqn\tau_21, \mjseqn\tau_31, \mjseqn\tau_41, \mjseqn\tau^2_2, \mjseqn\tau_32, \mjseqn\tau_42, \mjseqn\tau^2_3, \mjseqn\tau_43, and \mjseqn\tau^2_4, and hence V should be a \mjeqn10 \times 1010x10 variance-covariance matrix of these elements in this order. Argument means can then again be used to specify the means of the variables.

Value

An object of class "matreg". The object is a list containing the following components:

tab

a data frame with the estimated model coefficients, standard errors, test statistics, degrees of freedom (only for t-tests), p-values, and lower/upper confidence interval bounds.

vb

the variance-covariance matrix of the estimated model coefficients.

...

some additional elements/values.

The results are formatted and printed with the print function. Extractor functions include coef and vcov.

Note

Only the lower triangular part of R (and V if it is specified) is used in the computations.

If \mjeqnR_x,xR[x,x] is not invertible, an error will be issued. In this case, one can set argument nearpd=TRUE, in which case the nearPD function from the Matrix package will be used to find the nearest positive semi-definite matrix, which should be invertible. The results should be treated with caution when this is done.

When \mjseqnR is a covariance matrix with V and means specified, the means are treated as known constants when estimating the standard error of the intercept.

Author(s)

Wolfgang Viechtbauer wvb@metafor-project.org https://www.metafor-project.org

References

Becker, B. J. (1992). Using results from replicated studies to estimate linear models. Journal of Educational Statistics, 17(4), 341–362. ⁠https://doi.org/10.3102/10769986017004341⁠

Becker, B. J. (1995). Corrections to "Using results from replicated studies to estimate linear models". Journal of Educational and Behavioral Statistics, 20(1), 100–102. ⁠https://doi.org/10.3102/10769986020001100⁠

Becker, B. J., & Aloe, A. (2019). Model-based meta-analysis and related approaches. In H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), The handbook of research synthesis and meta-analysis (3rd ed., pp. 339–363). New York: Russell Sage Foundation.

See Also

rma.mv for a function to meta-analyze multiple correlation coefficients that can be used to construct an \mjseqnR matrix.

rcalc for a function to construct the variance-covariance matrix of dependent correlation coefficients.

Examples

############################################################################

### first an example unrelated to meta-analysis, simply demonstrating that
### one can obtain the same results from lm() and matreg()

### fit a regression model with lm() to the 'mtcars' dataset
res <- lm(mpg ~ hp + wt + am, data=mtcars)
summary(res)

### covariance matrix of the dataset
S <- cov(mtcars)

### fit the same regression model using matreg()
res <- matreg(y="mpg", x=c("hp","wt","am"), R=S, cov=TRUE,
              means=colMeans(mtcars), n=nrow(mtcars))
summary(res)

### copy the 'mtcars' dataset to 'dat' and standardize all variables
dat <- mtcars
dat[] <- scale(dat)

### fit a regression model with lm() to obtain standardized regression coefficients ('betas')
res <- lm(mpg ~ 0 + hp + wt + am, data=dat)
summary(res)

### correlation matrix of the dataset
R <- cor(mtcars)

### fit the same regression model using matreg()
res <- matreg(y="mpg", x=c("hp","wt","am"), R=R, n=nrow(mtcars))
summary(res)

### note: the standard errors of the betas should not be used to construct CIs
### as they assume that the null hypothesis (H0: beta_j = 0) is true

### construct the var-cov matrix of correlations in R
V <- rcalc(R, ni=nrow(mtcars))$V

### fit the same regression model using matreg() but now supply V
res <- matreg(y="mpg", x=c("hp","wt","am"), R=R, V=V)
summary(res)

### the standard errors computed in this way can now be used to construct
### CIs for the betas (here, the difference is relatively small)

############################################################################

### copy data into 'dat'
dat <- dat.craft2003

### construct dataset and var-cov matrix of the correlations
tmp <- rcalc(ri ~ var1 + var2 | study, ni=ni, data=dat)
V <- tmp$V
dat <- tmp$dat

### turn var1.var2 into a factor with the desired order of levels
dat$var1.var2 <- factor(dat$var1.var2,
   levels=c("acog.perf", "asom.perf", "conf.perf", "acog.asom", "acog.conf", "asom.conf"))

### multivariate random-effects model
res <- rma.mv(yi, V, mods = ~ var1.var2 - 1, random = ~ var1.var2 | study, struct="UN", data=dat)
res

### restructure estimated mean correlations into a 4x4 matrix
R <- vec2mat(coef(res))
rownames(R) <- colnames(R) <- c("perf", "acog", "asom", "conf")
round(R, digits=3)

### check that order in vcov(res) corresponds to order in R
round(vcov(res), digits=4)

### fit regression model with 'perf' as outcome and 'acog', 'asom', and 'conf' as predictors
matreg(1, 2:4, R=R, V=vcov(res))

### can also specify variable names
matreg("perf", c("acog","asom","conf"), R=R, V=vcov(res))

## Not run: 
### repeat the above but with r-to-z transformed correlations
dat <- dat.craft2003
tmp <- rcalc(ri ~ var1 + var2 | study, ni=ni, data=dat, rtoz=TRUE)
V <- tmp$V
dat <- tmp$dat
dat$var1.var2 <- factor(dat$var1.var2,
   levels=c("acog.perf", "asom.perf", "conf.perf", "acog.asom", "acog.conf", "asom.conf"))
res <- rma.mv(yi, V, mods = ~ var1.var2 - 1, random = ~ var1.var2 | study, struct="UN", data=dat)
R <- vec2mat(coef(res))
rownames(R) <- colnames(R) <- c("perf", "acog", "asom", "conf")
matreg(1, 2:4, R=R, V=vcov(res), ztor=TRUE)

## End(Not run)

############################################################################

### a different example based on van Houwelingen et al. (2002)

### create dataset in long format
dat.long <- to.long(measure="OR", ai=tpos, bi=tneg, ci=cpos, di=cneg,
                    data=dat.colditz1994, append=FALSE)
dat.long <- escalc(measure="PLO", xi=out1, mi=out2, data=dat.long)
levels(dat.long$group) <- c("CON", "EXP")
dat.long

### fit bivariate model
res <- rma.mv(yi, vi, mods = ~ group - 1, random = ~ group | study, struct="UN",
              data=dat.long, method="ML")
res

### regression of log(odds)_EXP on log(odds)_CON
matreg(y=2, x=1, R=res$G, cov=TRUE, means=coef(res), n=res$g.levels.comb.k)

### but the SE of the CON coefficient is not computed correctly, since above we treat res$G as if
### it was a var-cov matrix computed from raw data based on res$g.levels.comb.k (= 13) data points

### fit bivariate model and get the var-cov matrix of the estimates in res$G
res <- rma.mv(yi, vi, mods = ~ group - 1, random = ~ group | study, struct="UN",
              data=dat.long, method="ML", cvvc="varcov", control=list(nearpd=TRUE))

### now use res$vvc as the var-cov matrix of the estimates in res$G
matreg(y=2, x=1, R=res$G, cov=TRUE, means=coef(res), V=res$vvc)

metafor documentation built on Sept. 28, 2023, 1:07 a.m.