ispca: Inherently Sparse Principal Component Analysis (IS-PCA).

View source: R/ispca.R

ispcaR Documentation

Inherently Sparse Principal Component Analysis (IS-PCA).

Description

Performs IS-PCA

Usage

ispca(
  X,
  K = 1,
  block.structure,
  anp = "2",
  covariance = TRUE,
  method = "CDM",
  orthogonal = FALSE,
  standardize = FALSE,
  verbose = TRUE
)

Arguments

X

Data matrix of dimension nxp with possibly p >> n.

K

Number of singular vectors (loadings) to be computed for each identified submatrix. This means that b*K components are calculated if b blocks are identified by BD-SVD (see Bauer (2025) and bdsvd for details). If K is larger than the number of variables p_i contained in a submatrix, K = p_i loadings are computed.

block.structure

Underlying block structure. This parameter is optional as otherwise the functions runs bdsvd() to identify the underlying block structure. When supplied, it must be a 'bdsvd', 'blocks', or 'ispca' object. E.g., pass the result of bdsvd(), single.bdsvd(), ispca(), or detect.blocks(). An identified block structure by any other method can be supplied using detect.blocks() (see example below).

anp

Which regularization function should be used for the HBIC.

  • "1": implements a_{np} = 1 which corresponds to the BIC.

  • "2": implements a_{np} = 1/2 log(np) which corresponds to the regularization used by Bauer (2025).

  • "3": implements a_{np} = log(log(np)).

  • "4": implements a_{np} = log(log(p)) which corresponds to the regularization used by Wang et al. (2009) and Wang et al. (2013).

covariance

Perform IS-PCA on the covariance (TRUE) or correlation matrix (FALSE). Default is TRUE.

method

Which method should be used to calculate the eigenvectors (loadings) and eigenvalues. method = "DM" uses the method by Yata and Aoshima (2009), method = "CDM" uses the method by Yata and Aoshima (2010), and method = "PCA" uses svd. The methods by Yata and Aoshima (2009, 2010) are consistent in HDLSS settings. Default is method = "CDM".

orthogonal

The estimated eigenvectors (loadings) computed using method = "CDM" (Yata and Aoshima, 2010) are orthogonal in the limit thus only approximately orthogonal in the finite sample case. Should the loadings be orthogonalized? Default is FALSE.

standardize

Standardize the data for block detection using BD-SVD. Default is TRUE. Note: this does not affect the parameter covariance. PCA can be computed on the covariance matrix regardless.

verbose

Print out progress for bdsvd as iterations are performed. Default is TRUE.

Details

This function performs inherently sparse principal component analysis (IS-PCA) as introduced in Bauer (2026).

Value

A list with the following components:

v

The first estimated eigenvectors of the identified block diagonal covariance matrix (i.e., the loadings) as an object of type matrix. The eigenvectors are orthogonalized if orthogonal = TRUE.

l

The corresponding first estimated eigenvalues of the identified block diagonal covariance matrix.

exp.var

The explained variance of the first estimated eigenvalues l.

X.b

The detected submatrices using bdsvd as a list object.

block.structure

Either the block structure detected by bdsvd() or the user-supplied block.structure, depending on the input.

References

Bauer, J.O. (2025). High-dimensional block diagonal covariance structure detection using singular vectors, J. Comput. Graph. Stat., 34(3), 1005–1016

Bauer, J.O. (2026). Beyond regularization: inherently sparse principal component analysis. Stat. Comp.

Yata, K., Aoshima, M. (2009). PCA consistency for non-Gaussian data in high dimension, low sample size contex, Commun. Stat. - Theory Methods 38, 2634–2652.

Yata, K., Aoshima, M. (2010). Effective PCA for high-dimension, low-sample-size data with singular value decomposition of cross data matrix, J. Multivar. Anal. 101, 2060–2077.

See Also

bdsvd, bdsvd.structure

Examples

#Example 1: run IS-PCA on a gene expression data set with two tissue types

if (requireNamespace("dslabs", quietly = TRUE)) {
data("tissue_gene_expression", package = "dslabs")

#We only select the two tissue types kidney (6) and liver (7)
Y <- as.numeric(tissue_gene_expression$y)
X <- scale(tissue_gene_expression$x[Y %in% c(6, 7), ], scale = FALSE)
Y <- Y[Y %in% c(6, 7)]

# Run IS-PCA
ispca.obj <- ispca(X = X, anp = "1")
vhat <- ispca.obj$v[, 1:2]
ispc <- X %*% vhat

# Percentage of non-zero components in the first two loadings
round(colSums(vhat != 0)/ncol(X), 2)

# Plot the first two principal components
plot(ispc, pch = Y-5, xlab = "PC1", ylab = "PC2", main = "IS-PCA")

# Compare to CDM-PCA (see cdm.pca(...))
pca.obj <- cdm.pca(X = X, K = 2)
pc <- X %*% pca.obj$v

par(mfrow = c(1, 2))
plot(ispc, pch = Y-5, xlab = "PC1", ylab = "PC2", main = "IS-PCA")
plot(pc, pch = Y-5, xlab = "PC1", ylab = "PC2", main = "PCA")
par(mfrow = c(1, 1))
}


#Example 2: submit a block structure which was identified by any other approach. This can be
#done by transforming the block structure to an object of type 'blocks' using detect.blocks():

if (requireNamespace("glasso", quietly = TRUE)) {
#Simulate a data matrix X with a block diagonal population covariance matrix.
set.seed(1)
n <- 100
p <- 4
Sigma <- bdsvd.cov.sim(p = p, b = 2, design = "c")

X <-  matrix(rnorm(n * p), n, p) %*% chol(Sigma)
S <- cov(X)

#Identify the block structure using glasso()
S.block <- glasso::glasso(S, 0.2)$w

#S.blocks is a block diagonal matrix:
print(S.block != 0)
#We know extract the block information to an object of class 'blocks' using detect.blocks()
block.structure <- detect.blocks(S.block)
class(block.structure)

#The block.structure of class 'blocks' can now be supplied to ispca()
ispca(X, block.structure = block.structure, verbose = FALSE)
}


bdsvd documentation built on March 26, 2026, 5:10 p.m.