prmats: Principal (Sub)Matrices

View source: R/ispca.R

prmatsR Documentation

Principal (Sub)Matrices

Description

Identifies the principal (sub)matrices

Usage

prmats(X, block.structure, rule = "cumvar", value)

Arguments

X

Data matrix of dimension nxp with possibly p >> n.

block.structure

Underlying block structure. Must be a 'bdsvd', 'blocks', or 'ispca' object. E.g., pass the result of bdsvd(), single.bdsvd(), ispca(), or detect.blocks(). An identified block structure by any other method can be supplied using detect.blocks() (see example below).

rule

Which rule should be used to choose principal submatrices. rule = "cumvar" selects the smallest number of principal submatrices (ordered by explained variance) whose cumulative share is at least value (a proportion in (0, 1]). rule = "enrich" selects all principal submatrices that explain \timesvalue more than they should on average (see value).

value

Numeric parameter used by rule. If rule = "cumvar", value is the target cumulative proportion of explained variance (must be in (0, 1]). Default is 0.8. If rule = "enrich", value is the factor necessary to be selected compared to the equal-share baseline. E.g., if a submatrix should on average explain 10% of the total explained variance and if value = 2, this submatrix is only selected if it explains at least 2x10% = 20% of the total explained variance. Default is 2.

Details

This function selects the principal (sub)matrices as described in Bauer (2026).

Value

A named list with the following components:

prmats

List of submatrices ordered by explained variance (rule = 'cumvar') or by factor (rule = 'enrich'). Each element prmats[[b]] is a named list with:

expl.var

Proportion of total variance explained by block b.

avg.var

Average variance of the variables in block b.

factor

Enrichment factor expl.var / avg.var (see value argument of the function).

feature.names

Column names (variables) that belong to block b.

p.b

Number of variables in block b.

X.pr

The data matrix of the kept submatrices/variables.

Access

Submatrices can be accessed with list indexing, e.g., res$prmats[[1]]$feature.names gives the variable names of the first submatrix.

References

Bauer, J.O. (2025). High-dimensional block diagonal covariance structure detection using singular vectors, J. Comput. Graph. Stat., 34(3), 1005–1016

Bauer, J.O. (2026). Beyond regularization: inherently sparse principal component analysis. Stat. Comp.

See Also

bdsvd, ispca

Examples

#Example: principal submatrices of a gene expression data set with two tissue types

if (requireNamespace("dslabs", quietly = TRUE)) {
data("tissue_gene_expression", package = "dslabs")

#We only select the two tissue types kidney (6) and liver (7)
Y <- as.numeric(tissue_gene_expression$y)
X <- scale(tissue_gene_expression$x[Y %in% c(6, 7), ], scale = FALSE)
Y <- Y[Y %in% c(6, 7)]


#First: run IS-PCA (or submit a identified block structure using bdsvd(...) or detect.blocks(...))

ispca.obj <- ispca(X = X, anp = "1")


#Second: extract the submatrices that explain at least 80% (default value) of the total variance

res <- prmats(X, block.structure = ispca.obj)
res

#One submatix is selected which contains 236 variables (out of 500) and explains
#81.67% of the total variance
length(res$prmats)
res$prmats[[1]]$p.b
round(res$prmats[[1]]$expl.var * 100, 2)


#Alternatively: extract the submatrices that explain five times more of the total variance
#than they should on average ('factor')

res <- prmats(X, block.structure = ispca.obj, rule = "enrich", value = 1.5)
res

#The highest 'factor' is 1.73
res <- prmats(X, block.structure = ispca.obj, rule = "enrich", value = 2)


}


bdsvd documentation built on March 26, 2026, 5:10 p.m.