prmats: Principal (Sub)Matrices
In bdsvd: Block Structure Detection Using Singular Vectors

View source: R/ispca.R

prmats

R Documentation

Principal (Sub)Matrices

Description

Identifies the principal (sub)matrices

Usage

prmats(X, block.structure, rule = "cumvar", value)

Arguments

`X`	Data matrix of dimension `n`x`p` with possibly `p >> n`.
`block.structure`	Underlying block structure. Must be a '`bdsvd`', '`blocks`', or '`ispca`' object. E.g., pass the result of `bdsvd()`, `single.bdsvd()`, `ispca()`, or `detect.blocks()`. An identified block structure by any other method can be supplied using `detect.blocks()` (see example below).
`rule`	Which rule should be used to choose principal submatrices. `rule = "cumvar"` selects the smallest number of principal submatrices (ordered by explained variance) whose cumulative share is at least `value` (a proportion in (0, 1]). `rule = "enrich"` selects all principal submatrices that explain `\timesvalue` more than they should on average (see `value`).
`value`	Numeric parameter used by `rule`. If `rule = "cumvar"`, `value` is the target cumulative proportion of explained variance (must be in `(0, 1]`). Default is `0.8`. If `rule = "enrich"`, `value` is the factor necessary to be selected compared to the equal-share baseline. E.g., if a submatrix should on average explain 10% of the total explained variance and if `value = 2`, this submatrix is only selected if it explains at least 2`x`10% = 20% of the total explained variance. Default is `2`.

Details

This function selects the principal (sub)matrices as described in Bauer (2026).

Value

A named list with the following components:

prmats

List of submatrices ordered by explained variance (rule = 'cumvar') or by factor (rule = 'enrich'). Each element prmats[[b]] is a named list with:

expl.var: Proportion of total variance explained by block b.
avg.var: Average variance of the variables in block b.
factor: Enrichment factor expl.var / avg.var (see value argument of the function).
feature.names: Column names (variables) that belong to block b.
p.b: Number of variables in block b.

X.pr

The data matrix of the kept submatrices/variables.

Access

Submatrices can be accessed with list indexing, e.g., res$prmats[[1]]$feature.names gives the variable names of the first submatrix.

References

Bauer, J.O. (2025). High-dimensional block diagonal covariance structure detection using singular vectors, J. Comput. Graph. Stat., 34(3), 1005–1016

Bauer, J.O. (2026). Beyond regularization: inherently sparse principal component analysis. Stat. Comp.

Examples

#Example: principal submatrices of a gene expression data set with two tissue types

if (requireNamespace("dslabs", quietly = TRUE)) {
data("tissue_gene_expression", package = "dslabs")

#We only select the two tissue types kidney (6) and liver (7)
Y <- as.numeric(tissue_gene_expression$y)
X <- scale(tissue_gene_expression$x[Y %in% c(6, 7), ], scale = FALSE)
Y <- Y[Y %in% c(6, 7)]


#First: run IS-PCA (or submit a identified block structure using bdsvd(...) or detect.blocks(...))

ispca.obj <- ispca(X = X, anp = "1")


#Second: extract the submatrices that explain at least 80% (default value) of the total variance

res <- prmats(X, block.structure = ispca.obj)
res

#One submatix is selected which contains 236 variables (out of 500) and explains
#81.67% of the total variance
length(res$prmats)
res$prmats[[1]]$p.b
round(res$prmats[[1]]$expl.var * 100, 2)


#Alternatively: extract the submatrices that explain five times more of the total variance
#than they should on average ('factor')

res <- prmats(X, block.structure = ispca.obj, rule = "enrich", value = 1.5)
res

#The highest 'factor' is 1.73
res <- prmats(X, block.structure = ispca.obj, rule = "enrich", value = 2)


}

bdsvd documentation built on March 26, 2026, 5:10 p.m.