perm_test: Generic Permutation-Based Test
In bbuchsbaum/multivarious: Extensible Data Structures for Multivariate Analysis

perm_test

R Documentation

Generic Permutation-Based Test

Description

This generic function implements a permutation-based test to assess the significance of components or statistics in a fitted model. The actual procedure depends on the method defined for the specific model class. Typical usage:

Usage

perm_test(x, ...)

Arguments

`x`	A fitted model object (e.g. `pca`, `cross_projector`, `discriminant_projector`, `multiblock_biprojector`).
`...`	Additional arguments passed down to `shuffle_fun` or `measure_fun` (if applicable).\n#' Note: For `multiblock` methods, `Xlist`, `comps`, `alpha`, and `use_rspectra` (for biprojector) are handled as direct named arguments, not via `...`.
`X`	(Used by `pca`, `cross_projector`, `discriminant_projector`) The original primary data matrix used to fit `x`. Ignored by the `multiblock_biprojector` method.
`Y`	(Used by `cross_projector`) The secondary data block (n x pY). Ignored by other methods.
`Xlist`	(Used by `multiblock_biprojector` [optional, default `NULL`] and `multiblock_projector` [required]) List of data blocks.
`nperm`	Integer number of permutations (Default: 1000 for PCA, 500 for multiblock methods, 100 otherwise).
`measure_fun`	(Optional; Used by `pca`, `cross_projector`, `discriminant_projector`, `multiblock_projector`) A function for computing the statistic(s) of interest. Ignored by `multiblock_biprojector`. Signature/default varies by method (see Details).
`shuffle_fun`	(Optional; Used by all methods) A function for permuting the data appropriately. Signature/default varies by method (see Details).
`fit_fun`	(Optional; Used by `cross_projector`, `discriminant_projector`) A function for re-fitting a new model. Ignored by PCA and multiblock methods. Signature/default varies by method (see Details).
`stepwise`	(Used by `pca`) Logical indicating if sequential testing (P3 projection) should be performed. Default `TRUE`. (The multiblock methods also perform sequential testing based on `alpha` and `comps`, but this argument is ignored). Ignored by other methods.
`parallel`	(Used by all methods) Logical; if `TRUE`, attempt parallel execution via `future.apply::future_lapply`. Requires the `future.apply` package.
`alternative`	(Used by all methods) Character string for the alternative hypothesis: "greater" (default), "less", or "two.sided".
`alpha`	(Used by `pca`, `multiblock_biprojector`, `multiblock_projector`) Significance level for sequential stopping rule (default 0.05). Passed directly as a named argument to these methods.
`comps`	(Used by `pca`, `multiblock_biprojector`, `multiblock_projector`) Maximum number of components to test sequentially (default 4). Passed directly as a named argument to these methods.
`use_svd_solver`	(Used by `pca`) Optional string specifying the SVD solver (default "fast").
`use_rspectra`	(Used by `multiblock_biprojector`) Logical indicating whether to use RSpectra for eigenvalue calculation (default `TRUE`). Passed directly as a named argument.
`predict_method`	(Used by `discriminant_projector`) Prediction method (`"lda"` or `"euclid"`) used by the default measure function (default "lda").

Details

Shuffle or permute the data in a way that breaks the structure of interest (e.g., shuffle labels for supervised methods, shuffle columns/rows for unsupervised).
Re-fit or re-project the model on the permuted data. Depending on the class, this can be done via a fit_fun or a class-specific approach.
Measure the statistic of interest (e.g., variance explained, classification accuracy, canonical correlation).
Compare the distribution of permuted statistics to the observed statistic to compute an empirical p-value.

S3 methods define the specific defaults and required signatures for the functions involved in shuffling, fitting, and measuring.

This function provides a framework for permutation testing in various multivariate models. The specific implementation details, default functions, and relevant arguments vary by method.

PCA Method (perm_test.pca): Relevant arguments: X, nperm, measure_fun, shuffle_fun, stepwise, parallel, alternative, alpha, comps, use_svd_solver, .... Assesses significance of variance explained by each PC (Vitale et al., 2017). Default statistic: F_a. Default shuffle: column-wise. Default uses P3 projection and sequential stopping with alpha.

Cross Projector Method (perm_test.cross_projector): Relevant arguments: X, Y, nperm, measure_fun, shuffle_fun, fit_fun, parallel, alternative, .... Tests the X-Y relationship. Default statistic: x2y.mse. Default shuffle: rows of Y. Default fit: stats::cancor.

Discriminant Projector Method (perm_test.discriminant_projector): Relevant arguments: X, nperm, measure_fun, shuffle_fun, fit_fun, predict_method, parallel, alternative, .... Tests class separation. Default statistic: prediction accuracy. Default shuffle: labels. Default fit: MASS::lda.

Multiblock Bi-Projector Method (perm_test.multiblock_biprojector): Relevant arguments: Xlist (optional), nperm, shuffle_fun, parallel, alternative, alpha, comps, use_rspectra, .... Tests consensus using fixed internal statistic (eigenvalue) on scores for each component. The statistic is the leading eigenvalue of the covariance matrix of block scores for a given component (T^T, where T columns are scores of block b on component k). By default, it shuffles rows within each block independently (either from Xlist if provided via ..., or using the internally stored scores). It performs sequential testing for components specified by comps using the stopping rule defined by alpha (both passed via ...).

Multiblock Projector Method (perm_test.multiblock_projector): Relevant arguments: Xlist (required), nperm, measure_fun, shuffle_fun, parallel, alternative, alpha, comps, .... Tests consensus using measure_fun (default: mean abs corr) on scores projected from Xlist using the original model x. Does not refit.

Value

The structure of the return value depends on the method:

cross_projector and discriminant_projector:: Returns an object of class perm_test, a list containing: statistic, perm_values, p.value, alternative, method, nperm, call.
pca, multiblock_biprojector, and multiblock_projector:: Returns an object inheriting from perm_test (classes perm_test_pca, perm_test_multiblock, or perm_test respectively for multiblock_projector), \n a list containing: component_results (data frame with observed stat, pval, CIs per component), perm_values (matrix of permuted stats), alpha (if applicable), alternative, method, nperm (vector of successful permutations per component), call.

References

Buja, A., & Eyuboglu, N. (1992). Remarks on parallel analysis. Multivariate Behavioral Research, 27(4), 509-540. (Relevant for PCA permutation concepts)

Vitale, R., Westerhuis, J. A., Næs, T., Smilde, A. K., de Noord, O. E., & Ferrer, A. (2017). Selecting the number of factors in principal component analysis by permutation testing— Numerical and practical aspects. Journal of Chemometrics, 31(10), e2937. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1002/cem.2937")} (Specific to perm_test.pca)

Examples

# PCA Example
data(iris)
X_iris <- as.matrix(iris[,1:4])
mod_pca <- pca(X_iris, ncomp=4, preproc=center()) # Ensure centering

# Test first 3 components sequentially (faster with more nperm)
# Ensure a future plan is set for parallel=TRUE, e.g., future::plan("multisession")
res_pca <- perm_test(mod_pca, X_iris, nperm=50, comps=3, parallel=FALSE)
print(res_pca)

# PCA Example with row shuffling (tests different null hypothesis)
row_shuffle <- function(dat, ...) dat[sample(nrow(dat)), ]
res_pca_row <- perm_test(mod_pca, X_iris, nperm=50, comps=3,
                         shuffle_fun=row_shuffle, parallel=FALSE)
print(res_pca_row)

## Not run: 
# Cross Projector Example (using cancor)
X <- as.matrix(iris[,1:2])
Y <- as.matrix(iris[,3:4])
ccr <- cancor(X, Y)
mod_cp <- cross_projector(ccr$xcoef, ccr$ycoef)

# Perm test (is x2y.mse lower than chance?)
res_cp <- perm_test(mod_cp, X, Y=Y, nperm=50, alternative="less")
print(res_cp)

# Discriminant Projector Example (using LDA)
library(MASS)
lda_fit <- lda(X_iris, grouping=iris$Species)
mod_dp <- discriminant_projector(
  v = lda_fit$scaling,
  s = X_iris %*% lda_fit$scaling,
  sdev = lda_fit$svd,
  labels = iris$Species,
  preproc = prep(center()), # Assuming center() was intended for LDA
  Sigma = lda_fit$covariance # Needed for LDA prediction method
)

# Perm test (is accuracy higher than chance?)
res_dp <- perm_test(mod_dp, X_iris, nperm=50, alternative="greater")
print(res_dp)

# Multiblock Bi-Projector Example
# (Requires a multiblock model 'mod_mb' from e.g. MFA or ComDim)
# Assuming 'mod_mb' exists and has 2 blocks:
# res_mb <- perm_test(mod_mb, nperm=50, comps=3) 
# print(res_mb)
# Example using provided Xlist (list of matrices X1, X2):
# X1 <- matrix(rnorm(50*10), 50, 10)
# X2 <- matrix(rnorm(50*15), 50, 15)
# Assume mod_mb was fit on cbind(X1, X2) with block_indices=list(1:10, 11:25)
# res_mb_xlist <- perm_test(mod_mb, Xlist=list(X1, X2), nperm=50, comps=3)
# print(res_mb_xlist)

## End(Not run)

bbuchsbaum/multivarious documentation built on July 16, 2025, 11:04 p.m.