| bootstrap_pca | R Documentation |
pca functionPerforms bootstrap resampling for Principal Component Analysis (PCA) based on
the method described by Fisher et al. (2016), optimized for high-dimensional
data (p >> n). This version is specifically adapted to work with the output
object generated by the provided pca function (which returns a bi_projector
object of class 'pca').
bootstrap_pca(
x,
nboot = 100,
k = NULL,
parallel = FALSE,
cores = NULL,
seed = NULL,
epsilon = 1e-15,
...
)
x |
An object of class 'pca' as returned by the provided |
nboot |
The number of bootstrap resamples to perform. Must be a positive integer (default: 100). |
k |
The number of principal components to bootstrap (default: all
components available in the fitted PCA model |
parallel |
Logical flag indicating whether to use parallel processing
via the |
cores |
The number of cores to use for parallel processing if |
seed |
An integer value for the random number generator seed for reproducibility (default: NULL, no seed is set). |
epsilon |
A small positive value added to standard deviations before division to prevent division by zero or instability (default: 1e-15). |
... |
Additional arguments (currently ignored). |
This function implements the fast bootstrap PCA algorithm proposed by
Fisher et al. (2016), adapted for the output structure of the provided pca function.
The pca function returns an object containing:
v: Loadings (coefficients, p x k) - equivalent to V in SVD Y = U D V'. Note the transpose difference from prcomp.
s: Scores (n x k) - calculated as U %*% D.
sdev: Singular values (vector of length k) - equivalent to d.
u: Left singular vectors (n x k).
The bootstrap algorithm works by resampling the subjects (rows) and recomputing
the SVD on a low-dimensional representation. Specifically, it computes the SVD
of the resampled matrix D U' P^b, where Y = U D V' is the SVD of the original
(pre-processed) data, and P^b is a resampling matrix operating on the subjects (columns of U').
The SVD of the resampled low-dimensional matrix is svd(D U' P^b) = A^b S^b (R^b)'.
The bootstrap principal components (loadings) are then calculated as V^b = V A^b,
and the bootstrap scores are Scores^b = R^b S^b.
Z-scores are provided as mean / sd.
Important Note: The algorithm assumes the data Y used for the original SVD (Y = U D V')
was appropriately centered (or pre-processed according to x$preproc). The bootstrap
samples are generated based on the components derived from this pre-processed data.
A list object of class bootstrap_pca_result containing:
E_Vb |
Matrix (p x k) of the estimated bootstrap means of the principal components (loadings V^b = coefficients). |
sd_Vb |
Matrix (p x k) of the estimated bootstrap standard deviations of the principal components (loadings V^b). |
z_loadings |
Matrix (p x k) of the bootstrap Z-scores for the loadings, calculated as |
E_Scores |
Matrix (n x k) of the estimated bootstrap means of the principal component scores (S^b). |
sd_Scores |
Matrix (n x k) of the estimated bootstrap standard deviations of the principal component scores (S^b). |
z_scores |
Matrix (n x k) of the bootstrap Z-scores for the scores, calculated as |
E_Ab |
Matrix (k x k) of the estimated bootstrap means of the internal rotation matrices A^b. |
Ab_array |
Array (k x k x nboot) containing all the bootstrap rotation matrices A^b. |
Scores_array |
Array (n x k x nboot) containing all the bootstrap score matrices (S^b, with NAs for non-sampled subjects). |
nboot |
The number of bootstrap samples used (successful ones). |
k |
The number of components bootstrapped. |
call |
The matched call to the function. |
Fisher, Aaron, Brian Caffo, Brian Schwartz, and Vadim Zipunnikov. 2016. "Fast, Exact Bootstrap Principal Component Analysis for P > 1 Million." Journal of the American Statistical Association 111 (514): 846–60. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/01621459.2015.1062383")}.
# Simulate data (p=50, n=20)
set.seed(123)
p_dim <- 50
n_obs <- 20
Y_mat <- matrix(rnorm(p_dim * n_obs), nrow = p_dim, ncol = n_obs)
# Transpose for pca function input (n x p)
X_mat <- t(Y_mat)
# Perform PCA using the provided pca function
# Use center() pre-processing
pca_res <- pca(X_mat, ncomp = 5, preproc = center(), method = "fast")
# Run bootstrap on the pca result
boot_res <- bootstrap_pca(pca_res, nboot = 5, k = 5, seed = 456)
# Explore results
print(dim(boot_res$z_loadings)) # p x k Z-scores for loadings (coefficients)
print(dim(boot_res$z_scores)) # n x k Z-scores for scores
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.