meff: Estimate the Effective Number of Tests
In ozancinar/poolR: Methods for Pooling P-Values from (Dependent) Tests

meff	R Documentation

Estimate the Effective Number of Tests

Description

Estimate the effective number of tests.\loadmathjax

Usage

meff(R, eigen, method, ...)

Arguments

`R`	a \mjeqnk \times kk * k symmetric matrix that reflects the correlation structure among the tests.
`eigen`	optional vector to directly supply the eigenvalues to the function (instead of computing them from the matrix given via `R`).
`method`	character string to specify the method to be used to estimate the effective number of tests (either `"nyholt"`, `"liji"`, `"gao"`, `"galwey"`, or `"chen"`). See ‘Details’.
`...`	other arguments.

Details

The function estimates the effective number of tests based on one of five different methods. All methods except the one by Chen and Liu (2011) work by extracting the eigenvalues from the \mjseqnR matrix supplied via the R argument (or from the eigenvalues directly passed via the eigen argument). Letting \mjseqn\lambda_i denote the \mjseqnith eigenvalue of this matrix (with \mjseqni = 1, ..., k) in decreasing order, the effective number of tests (\mjseqnm) is estimated as follows.

Method by Nyholt (2004)

\mjdeqn

m = 1 + (k - 1) \left(1 - \frac\mboxVar(\lambda)k\right)m = 1 + (k - 1) (1 - Var(\lambda) / k) where \mjeqn\mboxVar(\lambda)Var(\lambda) is the observed sample variance of the \mjseqnk eigenvalues.

Method by Li & Ji (2005)

\mjdeqn

m = \sum_i = 1^k f(|\lambda_i|)m = sum_i=1^k f(|\lambda_i|) where \mjeqnf(x) = I(x \ge 1) + (x - \lfloor x \rfloor)f(x) = I(x \ge 1) + (x - floor(x)) and \mjeqn\lfloor \cdot \rfloorfloor(.) is the floor function.

Method by Gao et al. (2008)

\mjdeqn

m = \min(x) \; \mboxsuch that \; \frac\sum_i = 1^x \lambda_i\sum_i = 1^k \lambda_i > Cm = min(x) such that sum_i=1^x \lambda_(i) / sum_i=1^k \lambda_(i) > C where \mjseqnC is a pre-defined parameter which is set to 0.995 by default, but can be adjusted (see ‘Note’).

Method by Galwey (2009)

\mjdeqn

m = \frac\left(\sum_i = 1^k \sqrt\lambda_i'\right)^2\sum_i = 1^k \lambda_i'm = (sum_i=1^k \sqrt\lambda_i')^2 / \sum_i=1^k \lambda_i' where \mjeqn\lambda_i' = \max[0, \lambda_i]\lambda_i' = max[0, \lambda_i].

Method by Chen & Liu (2011)

\mjdeqn

m = \sum_i = 1^k \frac1R_im = sum_i=1^k 1/R_i where \mjeqnR_i = \sum_j = 1^k |r_ij|^CR_i = |r_ij|^C for \mjseqni = 1, ..., k and \mjseqnr_ij denotes the element in the \mjseqnR matrix in row \mjseqni and column \mjseqnj. By default, the value of \mjseqnC is set to 7, but can be adjusted (see ‘Note’).

Note: For all methods that can yield a non-integer estimate (all but the method by Gao et al., 2008), the resulting estimate \mjseqnm is rounded down to the nearest integer.

Specifying the R Matrix

The \mjseqnR matrix should reflect the dependence structure among the tests. There is no general solution on how such a matrix should be constructed, as this depends on the type of test and the sidedness of these tests. For example, we can use the correlations among related but changing elements across the analyses/tests, or a function thereof, as a proxy for the dependence structure. For example, when conducting \mjseqnk analyses with the same dependent variable and \mjseqnk different independent variables, the correlations among the independent variables could serve as such a proxy. Analogously, if analyses are conducted for \mjseqnk dependent variables with the same set of independent variables, the correlations among the dependent variables could be used instead.

If the tests of interest have test statistics that can be assumed to follow a multivariate normal distribution and a matrix is available that reflects the correlations among the test statistics (which might be approximated by the correlations among the interchanging independent or dependent variables), then the mvnconv function can be used to convert this correlation matrix into the correlations among the (one- or two-sided) \mjseqnp-values, which in turn can then be passed to the R argument. See ‘Examples’.

Non-Positive Semi-Definite R

Depending on the way \mjseqnR was constructed, it may happen that this matrix is not positive semi-definite, leading to negative eigenvalues. The methods given above can all still be carried out in this case. However, another possibility is to handle such a case by using an algorithm that finds the nearest positive (semi-)definite matrix (e.g., Higham 2002) before passing this matrix to the function (see nearPD from the Matrix package for a corresponding implementation).

Value

A scalar giving the estimate of the effective number of tests.

Note

For method = "gao", C = 0.995 by default, but a different value of C can be passed to the function via ... (e.g., meff(R, method = "gao", C = 0.95)). For method = "chen", C = 7 by default, but a different value of C can be passed to the function via ... (e.g., meff(R, method = "chen", C = 6)).

Author(s)

Ozan Cinar ozancinar86@gmail.com
Wolfgang Viechtbauer wvb@wvbauer.com

References

Chen, Z. X., & Liu, Q. Z. (2011). A new approach to account for the correlations among single nucleotide polymorphisms in genome-wide association studies. Human Heredity, 72(1), 1–9. ⁠https://doi.org/10.1159/000330135⁠

Cinar, O. & Viechtbauer, W. (2022). The poolr package for combining independent and dependent p values. Journal of Statistical Software, 101(1), 1–42. ⁠https://doi.org/10.18637/jss.v101.i01⁠

Gao, X., Starmer, J., & Martin, E. R. (2008). A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms. Genetic Epidemiology, 32(4), 361–369. ⁠https://doi.org/10.1002/gepi.20310⁠

Galwey, N. W. (2009). A new measure of the effective number of tests, a practical tool for comparing families of non-independent significance tests. Genetic Epidemiology, 33(7), 559–568. ⁠https://doi.org/10.1002/gepi.20408⁠

Higham, N. J. (2002). Computing the nearest correlation matrix: A problem from finance. IMA Journal of Numerical Analysis, 22(3), 329–343. ⁠https://doi.org/10.1093/imanum/22.3.329⁠

Li, J., & Ji, L. (2005). Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity, 95(3), 221–227. ⁠https://doi.org/10.1038/sj.hdy.6800717⁠

Nyholt, D. R. (2004). A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. American Journal of Human Genetics, 74(4), 765–769. ⁠https://doi.org/10.1086/383251⁠

Examples

# copy LD correlation matrix into r (see help(grid2ip) for details on these data)
r <- grid2ip.ld

# estimate the effective number of tests based on the LD correlation matrix
meff(r, method = "nyholt")
meff(r, method = "liji")
meff(r, method = "gao")
meff(r, method = "galwey")
meff(r, method = "chen")

# use mvnconv() to convert the LD correlation matrix into a matrix with the
# correlations among the (two-sided) p-values assuming that the test
# statistics follow a multivariate normal distribution with correlation
# matrix r (note: 'side = 2' by default in mvnconv())
mvnconv(r, target = "p", cov2cor = TRUE)[1:5,1:5] # show only rows/columns 1-5

# use this matrix instead for estimating the effective number of tests
meff(mvnconv(r, target = "p", cov2cor = TRUE), method = "nyholt")
meff(mvnconv(r, target = "p", cov2cor = TRUE), method = "liji")
meff(mvnconv(r, target = "p", cov2cor = TRUE), method = "gao")
meff(mvnconv(r, target = "p", cov2cor = TRUE), method = "galwey")
meff(mvnconv(r, target = "p", cov2cor = TRUE), method = "chen")

ozancinar/poolR documentation built on July 4, 2025, 10:58 a.m.