Exv.rx: Moments of correlation coefficients
In watanabe-j/eigvaldisp: Statistics of Eigenvalue Dispersion Indices

Exv.rx

R Documentation

Moments of correlation coefficients

Description

Internal functions to calculate exact and asymptotic moments of sample correlation coefficients r from population correlation coefficients

Exv.r1(): exact expectation of r

Exv.r2(): exact expectation of r^2

Exv.r3(): exact expectation of r^3

Exv.r3d(): alternative to Exv.r3()

Exv.r4(): exact expectation of r^4

Exv.r4d(): alternative to Exv.r4()

Var.r1(): exact variance of r

Var.r2(): exact variance of r^2

Cm3.r1(): exact central third moment of r

Cm4.r1(): exact central fourth moment of r

AExv.r1(): asymptotic expectation of r

AVar.r1(): asymptotic variance of r

ACm3.r1(): asymptotic central third moment of r

ACm4.r1(): asymptotic central fourth moment of r

AExv.r2(): asymptotic expectation of r^2

AExv.r3(): asymptotic expectation of r^3

AExv.r4(): asymptotic expectation of r^4

AVar.r2(): asymptotic variance of r^2

ACov.r1(): asymptotic covariance between two r's

Usage

Exv.r1(n, x, tol = .Machine$double.eps * 100, tol.hg = 0, maxiter = 2000, ...)

Exv.r2(
  n,
  x,
  do.square = TRUE,
  tol = .Machine$double.eps * 100,
  tol.hg = 0,
  maxiter = 2000,
  ...
)

Exv.r3(n, x, tol = .Machine$double.eps * 100, tol.hg = 0, maxiter = 2000, ...)

Exv.r3d(n, x, tol = .Machine$double.eps * 100, tol.hg = 0, maxiter = 2000, ...)

Exv.r4(
  n,
  x,
  do.square = TRUE,
  tol = .Machine$double.eps * 100,
  tol.hg = 0,
  maxiter = 2000,
  ...
)

Exv.r4d(
  n,
  x,
  do.square = TRUE,
  tol = .Machine$double.eps * 100,
  tol.hg = 0,
  maxiter = 2000,
  ...
)

Var.r1(n, x, tol = .Machine$double.eps * 100, tol.hg = 0, maxiter = 2000, ...)

Var.r2(
  n,
  x,
  do.square = TRUE,
  tol = .Machine$double.eps * 100,
  tol.hg = 0,
  maxiter = 2000,
  ...
)

Cm3.r1(n, x, tol = .Machine$double.eps * 100, tol.hg = 0, maxiter = 2000, ...)

Cm4.r1(n, x, tol = .Machine$double.eps * 100, tol.hg = 0, maxiter = 2000, ...)

AExv.r1(n, x, order. = 2, return_terms = FALSE, ...)

AVar.r1(n, x, order. = 2, return_terms = FALSE, ...)

ACm3.r1(n, x, order. = 3, return_terms = FALSE, ...)

ACm4.r1(n, x, order. = 3, return_terms = FALSE, ...)

AExv.r2(n, x, order. = 2, return_terms = FALSE, ...)

AExv.r3(n, x, order. = 2, return_terms = FALSE, ...)

AExv.r4(n, x, order. = 2, return_terms = FALSE, ...)

AVar.r2(n, x, order. = 2, return_terms = FALSE, ...)

ACov.r1(
  n,
  Rho,
  ind1 = c(1, 2),
  ind2 = c(1, 2),
  i = ind1[1],
  j = ind1[2],
  k = ind2[1],
  l = ind2[2],
  rij = Rho[i, j],
  rik = Rho[i, k],
  ril = Rho[i, l],
  rjk = Rho[j, k],
  rjl = Rho[j, l],
  rkl = Rho[k, l],
  ...
)

Arguments

`n`	Degrees of freedom (not sample size), numeric of length 1.
`x`	Population correlation coefficient, numeric of length 1 or more. For some functions, squared correlation coefficients can be passed via this argument when `do.square = FALSE`.
`tol`	Tolerance used to judge wheter `x` is sufficiently close to `0` or `\pm 1` (where numerical instability can happen).
`tol.hg, maxiter`	Passed to `hgf()` as tol and maxiter there.
`...`	Additional arguments in these functions are silently ignored.
`do.square`	Whether `x` is to be squared before passed to `hgf()`; `TRUE` by default; set this `FALSE` when `x` is already squared.
`order.`	Order of asymptotic approximation (exponent `k` in `O(n^k)`; for asymptotic expressions only).
`return_terms`	When TRUE, returns individual terms of different orders (for asymptotic expressions only). Related mainly for debugging purposes, but could be used to track the order in derived statistics.
`Rho`	Correlation matrix; assumed to be validly constructed.
`ind1, ind2`	Indices for the two focal coefficients (vectors of length 2). Superseded by `i, j, k, l`.
`i, j, k, l`	Alternative indices to specify the focal coefficients. Superseded by `rij, rik, ril, rjk, rjl, rkl`.
`rij, rik, ril, rjk, rjl, rkl`	The relevant population correlation coefficients.

Details

All these functions except ACov.r1() requires the arguments n and x. It can be length(x) > 1 (at least supeficially vectorized), but it is assumed length(n) = 1. This is because these functions are often called in the form sapply(n, ...) in other functions (Exv.VXX).

Covariance between two r's is a function of at least six population correlation coefficients. ACov.r1() takes as an argument the entire population correlation matrix R instead of individual correlation coefficients, and indices for the focal correlation coefficients. Alternatively, the all six relevant population correlation coefficients can be directly specified.

The exact expressions are from Soper et al. (1917) or Ghosh (1966). In particular:

Exv.r3(): From Soper et al. (1917, eq. xxviii); that of Ghosh (1966) seems inaccurate for this moment.
Exv.r3d(): Alternative to Exv.r3() from Soper et al. (1917, eq. xxvi). Equivalent but slightly slower.
Exv.r4(): From Ghosh (1966, eq. 1).
Exv.r4d(): Alternative to Exv.r4() from Soper et al. (1917, eq. xxi). Equivalent but slightly slower.

Some of the exact expressions might be instable when the x is small.

The asymptotic moments are from Ghosh (1966) (note that the symbol n is used for sample size there). Note: The validity of ACm3.r1() may need critical evaluation, as Ghosh's (1966) equation 1 (from which the result is supposedly drawn) seems inaccurate (see Soper et al. [1917: eq. xxviii]). The following functions depending on this might be inaccurate: AExv.r3(), AExv.r4(), and AVar.r2().

The asymptotic moment functions takes the argument order.. This is the exponent in O(n^k). Higher orders do not always yield better approximations for ordinary values of n. The allowed ranges and “good” choices (empirically confirmed by comparison to exact expressions) are as follows:

AExv.r1(), AExv.r2(): 1–7. 2 is good except when n < 6 where 3 is better.
AVar.r1(): 1–7. 2 is good for n <100 or so; for larger n, the difference tends to be negligible.
ACm3.r1(): 2–5. 3 is best for n > 15, while 4 may work better for smaller n.
ACm4.r1(): 2–5. 3 is good for n > 20 especially when x is small; 4 is better for small n and x > 0.5.
AExv.r3(): 1–5. 2 is good.
AExv.r4(): 1–5. 3 or 4 is good for small x, while 2 is better for x > 0.4.
AVar.r2(): 1–5. For n < 100, 3 or 4 is good for very small x, while 2 is better for x > 0.2. Slight overestimation seems to be common.

ACov.r1() is as in, e.g., Olkin and Siotani (1976), Olkin and Finn (1990). When ind1 == ind2, it calculates the variance for the coefficient.

Multivariate normality is assumed for all moments. Nevertheless, the distribution of r remains the same in many other conditions (e.g., in all elliptically contoured distributions; Anderson, 2003).

References

Anderson, T. W. (2003) An Introduction to Multivariate Statistical Analysis, 3rd edition. John Wiley & Sons, Hoboken, New Jersey.

Ghosh, B. K. (1966) Expansions for the moments of the distribution of correlation coefficients. Biometrika 53, 258–262. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.2307/2334076")}.

Olkin, I. and Finn, J. D. (1990) Testing correlated correlations. Psychological Bulletin 108, 330–333. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1037/0033-2909.108.2.330")}.

Olkin, I. and Siotani, M. (1976) Asymptotic distribution of functions of a correlation matrix. Pp. 235–251 in Editorial Committee for Publication of Essays in Probability and Statistics, eds. Essays in Probability and Statistics in Honor of Professor Junjiro Ogawa. Shinko Tsusho, Tokyo.

Soper, H. E., Young, A. W., Cave, B. M., Lee, A. and Pearson, K. (1917) On the distribution of the correlation coefficients in small samples. Appendix II to the papers of “Student” and R. A. Fisher. Biometrika 11, 328–413. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1093/biomet/11.4.328")}.

Examples

# Trivial examples
n <- 10    # Which typically corresponds to sample size of 11
eigvaldisp:::Exv.r1(n, 0)
eigvaldisp:::Exv.r1(n, 0.5) # Underestimate
eigvaldisp:::Exv.r2(n, 0)   # This is 1/n for x = 0

eigvaldisp:::AExv.r1(n, 0)
eigvaldisp:::AExv.r1(n, 0.5)
eigvaldisp:::AExv.r2(n, 0)  # Fairly close approximations of the above

# Comparison between exact vs. asymptotic expressions
n <- 3                 # Extremely small n
x <- seq(0, 1, 0.01)
cols <- rainbow(7)

# Exv.r1() vs AExv.r1() with different orders
plot(x, sapply(x, eigvaldisp:::Exv.r1, n = n), type = "l", lwd = 2,
     xlim = c(0, 1), ylim = c(0, 1))
for(i in 1:7) {
    lines(x, sapply(x, eigvaldisp:::AExv.r1, n = n, order = i),
          col = cols[i])
}
legend("topleft", title = "Order", legend = c(1:7, "Exact"),
       col = c(cols, "black"), lty = 1)

# Exv.r2() vs AExv.r2() with different orders
plot(x, sapply(x, eigvaldisp:::Exv.r2, n = n), type = "l", lwd = 2,
     xlim = c(0, 1), ylim = c(0, 1))
for(i in 1:7) {
    lines(x, sapply(x, eigvaldisp:::AExv.r2, n = n, order = i),
          col = cols[i])
}
legend("topleft", title = "Order", legend = c(1:7, "Exact"),
       col = c(cols, "black"), lty = 1)

watanabe-j/eigvaldisp documentation built on Dec. 8, 2023, 4:38 a.m.