bootstrap_pca | R Documentation |
pca
functionPerforms bootstrap resampling for Principal Component Analysis (PCA) based on
the method described by Fisher et al. (2016), optimized for high-dimensional
data (p >> n). This version is specifically adapted to work with the output
object generated by the provided pca
function (which returns a bi_projector
object of class 'pca').
bootstrap_pca(
x,
nboot = 100,
k = NULL,
parallel = FALSE,
cores = NULL,
seed = NULL,
epsilon = 1e-15,
...
)
x |
An object of class 'pca' as returned by the provided |
nboot |
The number of bootstrap resamples to perform. Must be a positive integer (default: 100). |
k |
The number of principal components to bootstrap (default: all
components available in the fitted PCA model |
parallel |
Logical flag indicating whether to use parallel processing
via the |
cores |
The number of cores to use for parallel processing if |
seed |
An integer value for the random number generator seed for reproducibility (default: NULL, no seed is set). |
epsilon |
A small positive value added to standard deviations before division to prevent division by zero or instability (default: 1e-15). |
... |
Additional arguments (currently ignored). |
This function implements the fast bootstrap PCA algorithm proposed by
Fisher et al. (2016), adapted for the output structure of the provided pca
function.
The pca
function returns an object containing:
v
: Loadings (coefficients, p x k) - equivalent to V in SVD Y = U D V'. Note the transpose difference from prcomp
.
s
: Scores (n x k) - calculated as U %*% D.
sdev
: Singular values (vector of length k) - equivalent to d.
u
: Left singular vectors (n x k).
The bootstrap algorithm works by resampling the subjects (rows) and recomputing
the SVD on a low-dimensional representation. Specifically, it computes the SVD
of the resampled matrix D U' P^b
, where Y = U D V'
is the SVD of the original
(pre-processed) data, and P^b
is a resampling matrix operating on the subjects (columns of U').
The SVD of the resampled low-dimensional matrix is svd(D U' P^b) = A^b S^b (R^b)'
.
The bootstrap principal components (loadings) are then calculated as V^b = V A^b
,
and the bootstrap scores are Scores^b = R^b S^b
.
Z-scores are provided as mean / sd
.
Important Note: The algorithm assumes the data Y
used for the original SVD (Y = U D V'
)
was appropriately centered (or pre-processed according to x$preproc
). The bootstrap
samples are generated based on the components derived from this pre-processed data.
A list
object of class bootstrap_pca_result
containing:
E_Vb |
Matrix (p x k) of the estimated bootstrap means of the principal components (loadings V^b = coefficients). |
sd_Vb |
Matrix (p x k) of the estimated bootstrap standard deviations of the principal components (loadings V^b). |
z_loadings |
Matrix (p x k) of the bootstrap Z-scores for the loadings, calculated as |
E_Scores |
Matrix (n x k) of the estimated bootstrap means of the principal component scores (S^b). |
sd_Scores |
Matrix (n x k) of the estimated bootstrap standard deviations of the principal component scores (S^b). |
z_scores |
Matrix (n x k) of the bootstrap Z-scores for the scores, calculated as |
E_Ab |
Matrix (k x k) of the estimated bootstrap means of the internal rotation matrices A^b. |
Ab_array |
Array (k x k x nboot) containing all the bootstrap rotation matrices A^b. |
Scores_array |
Array (n x k x nboot) containing all the bootstrap score matrices (S^b, with NAs for non-sampled subjects). |
nboot |
The number of bootstrap samples used (successful ones). |
k |
The number of components bootstrapped. |
call |
The matched call to the function. |
Fisher, Aaron, Brian Caffo, Brian Schwartz, and Vadim Zipunnikov. 2016. "Fast, Exact Bootstrap Principal Component Analysis for P > 1 Million." Journal of the American Statistical Association 111 (514): 846–60. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/01621459.2015.1062383")}.
# --- Assuming the pca and svd_wrapper functions provided are loaded ---
# Also assuming helper functions like center(), pass(), prep(), init_transform()
# are defined and available. Create simplified mocks if needed:
center <- function() structure(list(scale=FALSE, center=TRUE), class=c("center", "prepper"))
pass <- function() structure(list(scale=FALSE, center=FALSE), class=c("pass", "prepper"))
prep <- function(preproc) {
xfun <- function(X) scale(X, center = preproc$center, scale = preproc$scale)
# Add proper reverse logic if needed for projection
return(structure(list(transform = xfun, reverse = function(X) X), class="prep"))
}
init_transform <- function(proc, X) proc$transform(X)
bi_projector <- function(v, s, sdev, u, preproc, classes, method) {
structure(list(v=v, s=s, sdev=sdev, u=u, preproc=preproc, method=method),
class=c(classes, "bi_projector", "projector"))
}
coefficients.projector <- function(x,...) x$v
scores.projector <- function(x,...) x$s
sdev.projector <- function(x,...) x$sdev
# --- End Mock Helpers ---
# Simulate data (p=50, n=20)
set.seed(123)
p_dim <- 50
n_obs <- 20
Y_mat <- matrix(rnorm(p_dim * n_obs), nrow = p_dim, ncol = n_obs)
# Transpose for pca function input (n x p)
X_mat <- t(Y_mat)
# Perform PCA using the provided pca function
# Use center() pre-processing
pca_res <- pca(X_mat, ncomp = 5, preproc = center(), method = "fast")
# Run bootstrap on the pca result
boot_res <- bootstrap_pca(pca_res, nboot = 50, k = 5, seed = 456)
# Explore results
print(dim(boot_res$z_loadings)) # p x k Z-scores for loadings (coefficients)
print(dim(boot_res$z_scores)) # n x k Z-scores for scores
# Plot std dev for first loading vector (V)
plot(boot_res$sd_Vb[, 1], ylab = "Bootstrap SD", main = "Loading Vector 1 Variability")
# Plot std dev for first score vector (S)
plot(boot_res$sd_Scores[, 1], ylab = "Bootstrap SD", main = "Score Vector 1 Variability")
## Not run:
# Example with parallel processing (requires future & future.apply)
# library(future)
# library(future.apply)
# plan(multisession) # Set up parallel backend
boot_res_parallel <- bootstrap_pca(pca_res, nboot = 100, k = 5, seed = 456,
parallel = TRUE)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.