AVar.VRR_xx: Approximate variance of relative eigenvalue variance of...

AVar.VRR_xxR Documentation

Approximate variance of relative eigenvalue variance of correlation matrix

Description

Functions to obtain approximate variance of relative eigenvalue variance of correlation matrix \mathrm{Var}[V_{\mathrm{rel}}(\mathbf{R})]. There are several versions for each of two different expressions: pf* and k* families.

AVar.VRR_pf(): asymptotic and approximate variance of V_{\mathrm{rel}}(\mathbf{R}) based on Pan and Frank's (2004) approach. Prototype version.

AVar.VRR_pfv(): vectorized version of AVar.VRR_pf(); much faster, but requires a large RAM space as p grows

AVar.VRR_pfd(): further improvement over AVar.VRR_pfv()

AVar.VRR_pfc(): fast version using Rcpp; requires the extension package eigvaldispRcpp

AVar.VRR_kl(): asymptotic variance from Konishi's theory: V_{\mathrm{rel}}(\mathbf{R}) as function of eigenvalues

AVar.VRR_klv(): vectorized version of AVar.VRR_kl()

AVar.VRR_kr(): asymptotic variance from Konishi's theory: V_{\mathrm{rel}}(\mathbf{R}) as function of correlation coefficients

AVar.VRR_krv(): vectorized version of AVar.VRR_kr()

Usage

AVar.VRR_pf(
  Rho,
  n = 100,
  Lambda,
  exv1.mode = c("exact", "asymptotic"),
  var1.mode = "asymptotic",
  var2.mode = c("exact", "asymptotic"),
  order.exv1 = 2,
  order.var2 = 2,
  mode = c("for.ind", "nested.for", "lapply"),
  ...
)

AVar.VRR_pfv(
  Rho,
  n = 100,
  Lambda,
  exv1.mode = c("exact", "asymptotic"),
  var1.mode = "asymptotic",
  var2.mode = c("exact", "asymptotic"),
  order.exv1 = 2,
  order.var2 = 2,
  ...
)

AVar.VRR_pfd(
  Rho,
  n = 100,
  Lambda,
  exv1.mode = c("exact", "asymptotic"),
  var1.mode = "asymptotic",
  var2.mode = c("exact", "asymptotic"),
  order.exv1 = 2,
  order.var2 = 2,
  mode = c("for.ind", "lapply", "mclapply", "parLapply"),
  mc.cores = "auto",
  max.cores = parallel::detectCores(),
  do.mcaffinity = TRUE,
  affinity_mc = seq_len(max.cores),
  cl = NULL,
  max.size = 2e+06,
  verbose = c("no", "yes", "inline"),
  ...
)

AVar.VRR_pfc(
  Rho,
  n = 100,
  Lambda,
  cppfun = "Cov_r2C",
  nthreads = 0L,
  exv1.mode = c("exact", "asymptotic"),
  var2.mode = c("exact", "asymptotic"),
  order.exv1 = 2,
  order.var2 = 2,
  ...
)

AVar.VRR_kl(Rho, n = 100, Lambda, ...)

AVar.VRR_klv(Rho, n = 100, Lambda, ...)

AVar.VRR_kr(Rho, n = 100, Lambda, ...)

AVar.VRR_krv(Rho, n = 100, Lambda, ...)

Arguments

Rho

Population correlation matrix; assumed to be validly constructed (although simple checks are done).

n

Degrees of freedom (not sample sizes); numeric of length 1 or more.

Lambda

Numeric vector of population eigenvalues.

exv1.mode

Whether "exact" or "asymptotic" expression is used for \mathrm{E}(r).

var1.mode

Whether "exact" or "asymptotic" expression is used for \mathrm{Cov}(r_{ij}, r_{kl}). (At present, only "asymptotic" is allowed.)

var2.mode

Whether "exact" or "asymptotic" expression is used for \mathrm{Var}(r^2).

order.exv1, order.var2

Used to specify the order of evaluation for asymptotic expressions of \mathrm{E}(r) and \mathrm{Var}(r^2) when exv1.mode and var2.mode is "asymptotic"; see Exv.rx.

mode

In AVar.VRR_pf() and AVar.VRR_pfd(), specifies the mode of iterations (see Details).

...

In the pf family functions, passed to Exv.r1() and Var.r2() (when the corresponding modes are "exact"). Otherwise ignored.

mc.cores

Number of cores to be used (numeric/integer). When "auto" (default), set to min(c(ceiling(p / 2), max.cores)), which usually works well. (Used only when mode = "mclapply", or "parLapply")

max.cores

Maximum number of cores to be used. (Used only when mode = "mclapply", or "parLapply")

do.mcaffinity

Whether to run parallel::mcaffinity(), which seems required in some Linux environments to assign threads to multiple cores. (Used only when mode = "mclapply", or "parLapply")

affinity_mc

Argument of parallel::mcaffinity() to specify assignment of threads. (Used only when mode = "mclapply", or "parLapply")

cl

A cluster object (made by parallel::makeCluster()); when already created, one can be specified with this argument. Otherwise, one is created within function call, which is turned off on exit. (Used only when mode = "parLapply")

max.size

Maximum size of vectors created internally (see Details).

verbose

When "yes" or "inline", pogress of iteration is printed on console. "no" (default) turns off the printing. To be used in AVar.VRR_pfd() for large p (hundreds or more).

cppfun

Option to specify the C++ function to be used (see Details).

nthreads

Integer to specify the number of threads used in OpenMP parallelization in "Cov_r2A" and "Cov_r2E". By default (0), the number of threads is automatically set to one-half of that of (logical) processors detected (by the C++ function omp_get_num_procs()). Setting this beyond the number of physical processors can result in poorer performance, depending on the environment.

Details

Watanabe (2022) presented two approaches to evaluate approximate variance of the relative eigenvalue variance of a correlation matrix V_{\mathrm{rel}}(\mathbf{R}). One is Pan and Frank's (2004) heuristic approximation (eqs. 28 and 36–38 in Watanabe 2022). The other is based on Konishi's (1979) asymptotic theory (eq. 39 in Watanabe 2022). Simulations showed that the former tends to be more accurate, but the latter is much faster. This is mainly because the Pan–Frank approach involves evaluation of covariances in ~p^4 / 4 pairs of (squared) correlation coefficients. (That said, the speed will not be a practical concern unless p exceeds a few hundreds.)

The Pan–Frank approach is at present implemented in several functions which yield (almost) identical results:

AVar.VRR_pf()

Prototype version. Simplest implementation.

AVar.VRR_pfv()

Vectorized version. Much faster, but requires a large RAM space as p grows.

AVar.VRR_pfd()

Improvement over AVar.VRR_pfv(). Faster and more RAM-efficient. This is the default to be called in Var.VRR(..., method = "Pan-Frank"), unless the extension package eigvaldispRcpp is installed.

AVar.VRR_pfc()

Fast version using Rcpp. Requires the extension package eigvaldispRcpp; this is the default when this package is installed (and detected).

AVar.VRR_pfc() implements the same algorithm as the others, but makes use of C++ API via the package Rcpp for evaluation of the sum of covariance across pairs of squared correlation coefficients. This version works much faster than vectorized R implementations. Note that the output can slightly differ from those of pure R implementations (by the order of ~1e-9).

The Konishi approach is implemented in several functions:

AVar.VRR_kl()

From Konishi (1979: corollary 2.2): V_{\mathrm{rel}}(\mathbf{R}) as function of eigenvalues. Prototype version.

AVar.VRR_klv()

Vectorized version of AVar.VRR_kl(). This is the default when Var.VRR(..., method = "Konishi").

AVar.VRR_kr()

From Konishi (1979: theorem 6.2): V_{\mathrm{rel}}(\mathbf{R}) as function of correlation coefficients.

AVar.VRR_krv()

Vectorized version of AVar.VRR_kr(); slightly faster for moderate p, but not particularly fast for large p as the number of elements to be summed becomes large.

Empirically, these all yield the same result, but AVar.VRR_klv() is by far the fastest.

The AVar.VRR_pfx() family functions by default return exact variance when p = 2, If asymptotic result is desired, use mode.var2 = "asymptotic".

Options for mode in AVar.VRR_pf() and AVar.VRR_pfd():

"nested.for"

Only for AVar.VRR_pf(). Uses nested for loops, which is straifhgforward and RAM efficient but slow.

"for.ind" (default)/"lapply"

Run the iteration along an index vector to shorten computational time, with for loop and lapply(), respectively.

"mclapply"/"parLapply"

Only for AVar.VRR_pfd(). Parallelize the same iteration by forking and socketing, respectively, with the named functions in the package parallel. Note that the former doesn't work in the Windows environment. See vignette("parallel") for details.

AVar.VRR_pfv() internally generates vectors and matrices whose lengths are about p^4 / 8 and p^4 / 4. These take about 2p^4 bytes of RAM; this can be prohibitively large for large p.

AVar.VRR_pfd() divides index vectors used internally in AVar.VRR_pfv() into lists, along whose elements calculations are done to save RAM space. The argument max.size controls the maximum size of resulting vectors; at least max.size * (2 * length(n) + 6) * 8 bytes of RAM is required for storing temporary results (and more during computation); e.g., ~2e7 seems good for 16 GB RAM, ~4e8 for 256 GB. However, performance does not seem to improve past 1e61e7 presumably because memory allocation takes substantial time for large objects. The iteration can be parallelized with mode = "mclapply" or "parLapply", but be careful about RAM limitations.

AVar.VRR_pfc() provides a faster implementation with one of the C++ functions defined in the extension package eigvaldispRcpp (which is required to run this function). The C++ function is specified by the argument cppfun:

"Cov_r2C" (default)

Serial evaluation with base Rcpp functionalities.

"Cov_r2A" or "Armadillo"

Using RcppArmadillo. Parallelized with OpenMP when the environment allows.

"Cov_r2E" or "Eigen"

Using RcppEigen. Parallelized with OpenMP when the environment allows.

"Cov_r2P" or "Parallel"

Using RcppParallel. Parallelized with IntelTBB when the environment allows.

The default option would be sufficiently fast for up to p = 100 or so. The latter three options aim at speeding-up the calculation via parallelization with other Rcpp-related packages. Although these would have similar performance in most environments, "Cov_r2E" seems the fastest in the development environment, closely followed by "Cov_r2A".

Value

A numeric vector of \mathrm{Var}[V_{\mathrm{rel}}(\mathbf{R})], corresponding to n.

References

Konishi, S. (1979) Asymptotic expansions for the distributions of statistics based on the sample correlation matrix in principal componenet analysis. Hiroshima Mathematical Journal 9, 647–700. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.32917/hmj/1206134750")}.

Pan, W. and Frank, K. A. (2004) An approximation to the distribution of the product of two dependent correlation coefficients. Journal of Statistical Computation and Simulation 74, 419–443. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/00949650310001596822")}.

Watanabe, J. (2022) Statistics of eigenvalue dispersion indices: quantifying the magnitude of phenotypic integration. Evolution, 76, 4–28. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1111/evo.14382")}.

See Also

Exv.VXX for main moment functions

Examples

# See also examples of Exv.VXX
# Correlation matrix
N <- 20
Lambda <- c(4, 2, 1, 1)
(Rho <- GenCov(evalues = Lambda / sum(Lambda) * 4, evectors = "Givens"))

# Different choices for asymptotic variance of Vrel(R)
# Variance from Pan-Frank method
Var.VRR(Rho, n = N - 1) # By default, method = "Pan-Frank" and AVar.VRR_pfd() is called
eigvaldisp:::AVar.VRR_pfd(Rho, n = N - 1) # Same as above
Var.VRR(Rho, n = N - 1, fun = "pf")  # Calls AVar.VRR_pf(), which is slow for large p
Var.VRR(Rho, n = N - 1, fun = "pfv") # Calls AVar.VRR_pfv(), which requires much RAM for large p
# Various implementations with Rcpp (require eigvaldispRcpp):
## Not run: Var.VRR(Rho, n = N - 1, fun = "pfc") # By default, cppfun = "Cov_r2C" is used
## Not run: Var.VRR(Rho, n = N - 1, cppfun = "Cov_r2A")
## Not run: Var.VRR(Rho, n = N - 1, cppfun = "Cov_r2E")
## Not run: Var.VRR(Rho, n = N - 1, cppfun = "Cov_r2P")
# When the argument cppfun is provided, fun need not be specified
# The above results are identical

# Variance from Konishi's theory
Var.VRR(Rho, n = N - 1, method = "Konishi")  # By default for this method, AVar.VRR_klv() is called
eigvaldisp:::AVar.VRR_klv(Rho, n = N - 1)    # Same as above
Var.VRR(Rho, n = N - 1, fun = "kl")
Var.VRR(Rho, n = N - 1, fun = "kr")
Var.VRR(Rho, n = N - 1, fun = "krv")
# The results are identical, but the last three are slower
# On the other hand, these differ from that obtained with the Pan-Frank method

# Example with p = 2
Rho2 <- GenCov(evalues = c(1.5, 0.5), evectors = "Givens")
Var.VRR(Rho2, n = N - 1)  # When p = 2, this does not call AVar.VRR_pfd()
eigvaldisp:::AVar.VRR_pfd(Rho2, n = N - 1)
# By default, the above returns the same, exact result

eigvaldisp:::AVar.VRR_pfd(Rho2, n = N - 1, var2.mode = "asymptotic")
eigvaldisp:::AVar.VRR_klv(Rho2, n = N - 1)
# These return different asymptotic expressions


watanabe-j/eigvaldisp documentation built on Dec. 8, 2023, 4:38 a.m.