AVar.VRR_xx: Approximate variance of relative eigenvalue variance of...
In watanabe-j/eigvaldisp: Statistics of Eigenvalue Dispersion Indices

AVar.VRR_xx

R Documentation

Approximate variance of relative eigenvalue variance of correlation matrix

Description

Functions to obtain approximate variance of relative eigenvalue variance of correlation matrix \mathrm{Var}[V_{\mathrm{rel}}(\mathbf{R})]. There are several versions for each of two different expressions: pf* and k* families.

AVar.VRR_pf(): asymptotic and approximate variance of V_{\mathrm{rel}}(\mathbf{R}) based on Pan and Frank's (2004) approach. Prototype version.

AVar.VRR_pfv(): vectorized version of AVar.VRR_pf(); much faster, but requires a large RAM space as p grows

AVar.VRR_pfd(): further improvement over AVar.VRR_pfv()

AVar.VRR_pfc(): fast version using Rcpp; requires the extension package eigvaldispRcpp

AVar.VRR_kl(): asymptotic variance from Konishi's theory: V_{\mathrm{rel}}(\mathbf{R}) as function of eigenvalues

AVar.VRR_klv(): vectorized version of AVar.VRR_kl()

AVar.VRR_kr(): asymptotic variance from Konishi's theory: V_{\mathrm{rel}}(\mathbf{R}) as function of correlation coefficients

AVar.VRR_krv(): vectorized version of AVar.VRR_kr()

Usage

AVar.VRR_pf(
  Rho,
  n = 100,
  Lambda,
  exv1.mode = c("exact", "asymptotic"),
  var1.mode = "asymptotic",
  var2.mode = c("exact", "asymptotic"),
  order.exv1 = 2,
  order.var2 = 2,
  mode = c("for.ind", "nested.for", "lapply"),
  ...
)

AVar.VRR_pfv(
  Rho,
  n = 100,
  Lambda,
  exv1.mode = c("exact", "asymptotic"),
  var1.mode = "asymptotic",
  var2.mode = c("exact", "asymptotic"),
  order.exv1 = 2,
  order.var2 = 2,
  ...
)

AVar.VRR_pfd(
  Rho,
  n = 100,
  Lambda,
  exv1.mode = c("exact", "asymptotic"),
  var1.mode = "asymptotic",
  var2.mode = c("exact", "asymptotic"),
  order.exv1 = 2,
  order.var2 = 2,
  mode = c("for.ind", "lapply", "mclapply", "parLapply"),
  mc.cores = "auto",
  max.cores = parallel::detectCores(),
  do.mcaffinity = TRUE,
  affinity_mc = seq_len(max.cores),
  cl = NULL,
  max.size = 2e+06,
  verbose = c("no", "yes", "inline"),
  ...
)

AVar.VRR_pfc(
  Rho,
  n = 100,
  Lambda,
  cppfun = "Cov_r2C",
  nthreads = 0L,
  exv1.mode = c("exact", "asymptotic"),
  var2.mode = c("exact", "asymptotic"),
  order.exv1 = 2,
  order.var2 = 2,
  ...
)

AVar.VRR_kl(Rho, n = 100, Lambda, ...)

AVar.VRR_klv(Rho, n = 100, Lambda, ...)

AVar.VRR_kr(Rho, n = 100, Lambda, ...)

AVar.VRR_krv(Rho, n = 100, Lambda, ...)

Arguments

`Rho`	Population correlation matrix; assumed to be validly constructed (although simple checks are done).
`n`	Degrees of freedom (not sample sizes); numeric of length 1 or more.
`Lambda`	Numeric vector of population eigenvalues.
`exv1.mode`	Whether `"exact"` or `"asymptotic"` expression is used for `\mathrm{E}(r)`.
`var1.mode`	Whether `"exact"` or `"asymptotic"` expression is used for `\mathrm{Cov}(r_{ij}, r_{kl})`. (At present, only `"asymptotic"` is allowed.)
`var2.mode`	Whether `"exact"` or `"asymptotic"` expression is used for `\mathrm{Var}(r^2)`.
`order.exv1, order.var2`	Used to specify the order of evaluation for asymptotic expressions of `\mathrm{E}(r)` and `\mathrm{Var}(r^2)` when `exv1.mode` and `var2.mode` is `"asymptotic"`; see `Exv.rx`.
`mode`	In `AVar.VRR_pf()` and `AVar.VRR_pfd()`, specifies the mode of iterations (see Details).
`...`	In the `pf` family functions, passed to `Exv.r1()` and `Var.r2()` (when the corresponding modes are `"exact"`). Otherwise ignored.
`mc.cores`	Number of cores to be used (numeric/integer). When `"auto"` (default), set to `min(c(ceiling(p / 2), max.cores))`, which usually works well. (Used only when `mode = "mclapply"`, or `"parLapply"`)
`max.cores`	Maximum number of cores to be used. (Used only when `mode = "mclapply"`, or `"parLapply"`)
`do.mcaffinity`	Whether to run `parallel::mcaffinity()`, which seems required in some Linux environments to assign threads to multiple cores. (Used only when `mode = "mclapply"`, or `"parLapply"`)
`affinity_mc`	Argument of `parallel::mcaffinity()` to specify assignment of threads. (Used only when `mode = "mclapply"`, or `"parLapply"`)
`cl`	A cluster object (made by `parallel::makeCluster()`); when already created, one can be specified with this argument. Otherwise, one is created within function call, which is turned off on exit. (Used only when `mode = "parLapply"`)
`max.size`	Maximum size of vectors created internally (see Details).
`verbose`	When `"yes"` or `"inline"`, pogress of iteration is printed on console. `"no"` (default) turns off the printing. To be used in `AVar.VRR_pfd()` for large `p` (hundreds or more).
`cppfun`	Option to specify the C++ function to be used (see Details).
`nthreads`	Integer to specify the number of threads used in OpenMP parallelization in `"Cov_r2A"` and `"Cov_r2E"`. By default (`0`), the number of threads is automatically set to one-half of that of (logical) processors detected (by the C++ function `omp_get_num_procs()`). Setting this beyond the number of physical processors can result in poorer performance, depending on the environment.

Details

Watanabe (2022) presented two approaches to evaluate approximate variance of the relative eigenvalue variance of a correlation matrix V_{\mathrm{rel}}(\mathbf{R}). One is Pan and Frank's (2004) heuristic approximation (eqs. 28 and 36–38 in Watanabe 2022). The other is based on Konishi's (1979) asymptotic theory (eq. 39 in Watanabe 2022). Simulations showed that the former tends to be more accurate, but the latter is much faster. This is mainly because the Pan–Frank approach involves evaluation of covariances in ~p^4 / 4 pairs of (squared) correlation coefficients. (That said, the speed will not be a practical concern unless p exceeds a few hundreds.)

The Pan–Frank approach is at present implemented in several functions which yield (almost) identical results:

AVar.VRR_pf(): Prototype version. Simplest implementation.
AVar.VRR_pfv(): Vectorized version. Much faster, but requires a large RAM space as p grows.
AVar.VRR_pfd(): Improvement over AVar.VRR_pfv(). Faster and more RAM-efficient. This is the default to be called in Var.VRR(..., method = "Pan-Frank"), unless the extension package eigvaldispRcpp is installed.
AVar.VRR_pfc(): Fast version using Rcpp. Requires the extension package eigvaldispRcpp; this is the default when this package is installed (and detected).

AVar.VRR_pfc() implements the same algorithm as the others, but makes use of C++ API via the package Rcpp for evaluation of the sum of covariance across pairs of squared correlation coefficients. This version works much faster than vectorized R implementations. Note that the output can slightly differ from those of pure R implementations (by the order of ~1e-9).

The Konishi approach is implemented in several functions:

AVar.VRR_kl(): From Konishi (1979: corollary 2.2): V_{\mathrm{rel}}(\mathbf{R}) as function of eigenvalues. Prototype version.
AVar.VRR_klv(): Vectorized version of AVar.VRR_kl(). This is the default when Var.VRR(..., method = "Konishi").
AVar.VRR_kr(): From Konishi (1979: theorem 6.2): V_{\mathrm{rel}}(\mathbf{R}) as function of correlation coefficients.
AVar.VRR_krv(): Vectorized version of AVar.VRR_kr(); slightly faster for moderate p, but not particularly fast for large p as the number of elements to be summed becomes large.

Empirically, these all yield the same result, but AVar.VRR_klv() is by far the fastest.

The AVar.VRR_pfx() family functions by default return exact variance when p = 2, If asymptotic result is desired, use mode.var2 = "asymptotic".

Options for mode in AVar.VRR_pf() and AVar.VRR_pfd():

"nested.for": Only for AVar.VRR_pf(). Uses nested for loops, which is straifhgforward and RAM efficient but slow.
"for.ind" (default)/"lapply": Run the iteration along an index vector to shorten computational time, with for loop and lapply(), respectively.
"mclapply"/"parLapply": Only for AVar.VRR_pfd(). Parallelize the same iteration by forking and socketing, respectively, with the named functions in the package parallel. Note that the former doesn't work in the Windows environment. See vignette("parallel") for details.

AVar.VRR_pfv() internally generates vectors and matrices whose lengths are about p^4 / 8 and p^4 / 4. These take about 2p^4 bytes of RAM; this can be prohibitively large for large p.

AVar.VRR_pfd() divides index vectors used internally in AVar.VRR_pfv() into lists, along whose elements calculations are done to save RAM space. The argument max.size controls the maximum size of resulting vectors; at least max.size * (2 * length(n) + 6) * 8 bytes of RAM is required for storing temporary results (and more during computation); e.g., ~2e7 seems good for 16 GB RAM, ~4e8 for 256 GB. However, performance does not seem to improve past 1e6–1e7 presumably because memory allocation takes substantial time for large objects. The iteration can be parallelized with mode = "mclapply" or "parLapply", but be careful about RAM limitations.

AVar.VRR_pfc() provides a faster implementation with one of the C++ functions defined in the extension package eigvaldispRcpp (which is required to run this function). The C++ function is specified by the argument cppfun:

"Cov_r2C" (default): Serial evaluation with base Rcpp functionalities.
"Cov_r2A" or "Armadillo": Using RcppArmadillo. Parallelized with OpenMP when the environment allows.
"Cov_r2E" or "Eigen": Using RcppEigen. Parallelized with OpenMP when the environment allows.
"Cov_r2P" or "Parallel": Using RcppParallel. Parallelized with IntelTBB when the environment allows.

The default option would be sufficiently fast for up to p = 100 or so. The latter three options aim at speeding-up the calculation via parallelization with other Rcpp-related packages. Although these would have similar performance in most environments, "Cov_r2E" seems the fastest in the development environment, closely followed by "Cov_r2A".

Value

A numeric vector of \mathrm{Var}[V_{\mathrm{rel}}(\mathbf{R})], corresponding to n.

References

Konishi, S. (1979) Asymptotic expansions for the distributions of statistics based on the sample correlation matrix in principal componenet analysis. Hiroshima Mathematical Journal 9, 647–700. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.32917/hmj/1206134750")}.

Pan, W. and Frank, K. A. (2004) An approximation to the distribution of the product of two dependent correlation coefficients. Journal of Statistical Computation and Simulation 74, 419–443. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/00949650310001596822")}.

Watanabe, J. (2022) Statistics of eigenvalue dispersion indices: quantifying the magnitude of phenotypic integration. Evolution, 76, 4–28. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1111/evo.14382")}.

Examples

# See also examples of Exv.VXX
# Correlation matrix
N <- 20
Lambda <- c(4, 2, 1, 1)
(Rho <- GenCov(evalues = Lambda / sum(Lambda) * 4, evectors = "Givens"))

# Different choices for asymptotic variance of Vrel(R)
# Variance from Pan-Frank method
Var.VRR(Rho, n = N - 1) # By default, method = "Pan-Frank" and AVar.VRR_pfd() is called
eigvaldisp:::AVar.VRR_pfd(Rho, n = N - 1) # Same as above
Var.VRR(Rho, n = N - 1, fun = "pf")  # Calls AVar.VRR_pf(), which is slow for large p
Var.VRR(Rho, n = N - 1, fun = "pfv") # Calls AVar.VRR_pfv(), which requires much RAM for large p
# Various implementations with Rcpp (require eigvaldispRcpp):
## Not run: Var.VRR(Rho, n = N - 1, fun = "pfc") # By default, cppfun = "Cov_r2C" is used
## Not run: Var.VRR(Rho, n = N - 1, cppfun = "Cov_r2A")
## Not run: Var.VRR(Rho, n = N - 1, cppfun = "Cov_r2E")
## Not run: Var.VRR(Rho, n = N - 1, cppfun = "Cov_r2P")
# When the argument cppfun is provided, fun need not be specified
# The above results are identical

# Variance from Konishi's theory
Var.VRR(Rho, n = N - 1, method = "Konishi")  # By default for this method, AVar.VRR_klv() is called
eigvaldisp:::AVar.VRR_klv(Rho, n = N - 1)    # Same as above
Var.VRR(Rho, n = N - 1, fun = "kl")
Var.VRR(Rho, n = N - 1, fun = "kr")
Var.VRR(Rho, n = N - 1, fun = "krv")
# The results are identical, but the last three are slower
# On the other hand, these differ from that obtained with the Pan-Frank method

# Example with p = 2
Rho2 <- GenCov(evalues = c(1.5, 0.5), evectors = "Givens")
Var.VRR(Rho2, n = N - 1)  # When p = 2, this does not call AVar.VRR_pfd()
eigvaldisp:::AVar.VRR_pfd(Rho2, n = N - 1)
# By default, the above returns the same, exact result

eigvaldisp:::AVar.VRR_pfd(Rho2, n = N - 1, var2.mode = "asymptotic")
eigvaldisp:::AVar.VRR_klv(Rho2, n = N - 1)
# These return different asymptotic expressions

watanabe-j/eigvaldisp documentation built on Dec. 8, 2023, 4:38 a.m.