ispca: Inherently Sparse Principal Component Analysis (IS-PCA).
In bdsvd: Block Structure Detection Using Singular Vectors

View source: R/ispca.R

ispca

R Documentation

Inherently Sparse Principal Component Analysis (IS-PCA).

Description

Performs IS-PCA

Usage

ispca(
  X,
  K = 1,
  block.structure,
  anp = "2",
  covariance = TRUE,
  method = "CDM",
  orthogonal = FALSE,
  standardize = FALSE,
  verbose = TRUE
)

Arguments

`X`	Data matrix of dimension `n`x`p` with possibly `p >> n`.
`K`	Number of singular vectors (loadings) to be computed for each identified submatrix. This means that `b*K` components are calculated if `b` blocks are identified by BD-SVD (see Bauer (2025) and `bdsvd` for details). If `K` is larger than the number of variables `p_i` contained in a submatrix, `K = p_i` loadings are computed.
`block.structure`	Underlying block structure. This parameter is optional as otherwise the functions runs `bdsvd()` to identify the underlying block structure. When supplied, it must be a '`bdsvd`', '`blocks`', or '`ispca`' object. E.g., pass the result of `bdsvd()`, `single.bdsvd()`, `ispca()`, or `detect.blocks()`. An identified block structure by any other method can be supplied using `detect.blocks()` (see example below).
`anp`	Which regularization function should be used for the HBIC. `"1"`: implements `a_{np} = 1` which corresponds to the BIC. `"2"`: implements `a_{np} = 1/2 log(np)` which corresponds to the regularization used by Bauer (2025). `"3"`: implements `a_{np} = log(log(np))`. `"4"`: implements `a_{np} = log(log(p))` which corresponds to the regularization used by Wang et al. (2009) and Wang et al. (2013).
`covariance`	Perform IS-PCA on the covariance (`TRUE`) or correlation matrix (`FALSE`). Default is `TRUE`.
`method`	Which method should be used to calculate the eigenvectors (loadings) and eigenvalues. `method = "DM"` uses the method by Yata and Aoshima (2009), `method = "CDM"` uses the method by Yata and Aoshima (2010), and `method = "PCA"` uses `svd`. The methods by Yata and Aoshima (2009, 2010) are consistent in HDLSS settings. Default is `method = "CDM"`.
`orthogonal`	The estimated eigenvectors (loadings) computed using `method = "CDM"` (Yata and Aoshima, 2010) are orthogonal in the limit thus only approximately orthogonal in the finite sample case. Should the loadings be orthogonalized? Default is `FALSE`.
`standardize`	Standardize the data for block detection using BD-SVD. Default is `TRUE`. Note: this does not affect the parameter `covariance`. PCA can be computed on the covariance matrix regardless.
`verbose`	Print out progress for `bdsvd` as iterations are performed. Default is `TRUE`.

Details

This function performs inherently sparse principal component analysis (IS-PCA) as introduced in Bauer (2026).

Value

A list with the following components:

v: The first estimated eigenvectors of the identified block diagonal covariance matrix (i.e., the loadings) as an object of type matrix. The eigenvectors are orthogonalized if orthogonal = TRUE.
l: The corresponding first estimated eigenvalues of the identified block diagonal covariance matrix.
exp.var: The explained variance of the first estimated eigenvalues l.
X.b: The detected submatrices using bdsvd as a list object.
block.structure: Either the block structure detected by bdsvd() or the user-supplied block.structure, depending on the input.

References

Bauer, J.O. (2025). High-dimensional block diagonal covariance structure detection using singular vectors, J. Comput. Graph. Stat., 34(3), 1005–1016

Bauer, J.O. (2026). Beyond regularization: inherently sparse principal component analysis. Stat. Comp.

Yata, K., Aoshima, M. (2009). PCA consistency for non-Gaussian data in high dimension, low sample size contex, Commun. Stat. - Theory Methods 38, 2634–2652.

Yata, K., Aoshima, M. (2010). Effective PCA for high-dimension, low-sample-size data with singular value decomposition of cross data matrix, J. Multivar. Anal. 101, 2060–2077.

Examples

#Example 1: run IS-PCA on a gene expression data set with two tissue types

if (requireNamespace("dslabs", quietly = TRUE)) {
data("tissue_gene_expression", package = "dslabs")

#We only select the two tissue types kidney (6) and liver (7)
Y <- as.numeric(tissue_gene_expression$y)
X <- scale(tissue_gene_expression$x[Y %in% c(6, 7), ], scale = FALSE)
Y <- Y[Y %in% c(6, 7)]

# Run IS-PCA
ispca.obj <- ispca(X = X, anp = "1")
vhat <- ispca.obj$v[, 1:2]
ispc <- X %*% vhat

# Percentage of non-zero components in the first two loadings
round(colSums(vhat != 0)/ncol(X), 2)

# Plot the first two principal components
plot(ispc, pch = Y-5, xlab = "PC1", ylab = "PC2", main = "IS-PCA")

# Compare to CDM-PCA (see cdm.pca(...))
pca.obj <- cdm.pca(X = X, K = 2)
pc <- X %*% pca.obj$v

par(mfrow = c(1, 2))
plot(ispc, pch = Y-5, xlab = "PC1", ylab = "PC2", main = "IS-PCA")
plot(pc, pch = Y-5, xlab = "PC1", ylab = "PC2", main = "PCA")
par(mfrow = c(1, 1))
}


#Example 2: submit a block structure which was identified by any other approach. This can be
#done by transforming the block structure to an object of type 'blocks' using detect.blocks():

if (requireNamespace("glasso", quietly = TRUE)) {
#Simulate a data matrix X with a block diagonal population covariance matrix.
set.seed(1)
n <- 100
p <- 4
Sigma <- bdsvd.cov.sim(p = p, b = 2, design = "c")

X <-  matrix(rnorm(n * p), n, p) %*% chol(Sigma)
S <- cov(X)

#Identify the block structure using glasso()
S.block <- glasso::glasso(S, 0.2)$w

#S.blocks is a block diagonal matrix:
print(S.block != 0)
#We know extract the block information to an object of class 'blocks' using detect.blocks()
block.structure <- detect.blocks(S.block)
class(block.structure)

#The block.structure of class 'blocks' can now be supplied to ispca()
ispca(X, block.structure = block.structure, verbose = FALSE)
}

bdsvd documentation built on March 26, 2026, 5:10 p.m.