dimension: Signal subspace dimension estimation in high-dimensional...

View source: R/dimension.R View source: R/dimension.R

dimensionR Documentation

Signal subspace dimension estimation in high-dimensional matrix

Description

Estimate the dimension of a signal-rich subspace in large, high-dimensional data.

Usage

dimension(
  x,
  components = NA,
  decomposition = c("svd", "eigen"),
  method = c("double_posterior", "posterior", "kmeans", "ladle"),
  num_est_samples = NA,
  verbose = FALSE,
  ...
)

Arguments

x

A subspace class or a numeric real-valued matrix with n number of samples and p number of features. If p > n, a warning message is generated and the transpose of x is used.

components

A series of right singular vectors to estimate. Components must be smaller or equal to min(nrow(x),ncol(x)).

decomposition

The method to be used; method = "svd" returns results from singular value decomposition; method = "eigen" returns results from eigenvalue decomposition.

method

The method to be used; method = "double_posterior" returns results from function estimate_rank_double_posterior; method = "posterior" returns results from function estimate_rank_posterior; method = "kmeans" returns results from function estimate_rank_kmeans; method = "ladle" returns results from function estimate_rank_ladle. Default uses estimate_rank_double_posterior.

num_est_samples

Split data into num_est_samples-fold for parallel computation.

verbose

output message

...

Extra parameters

Value

Returns a list with entries:

ndf:

The number of degrees of freedom of x.

pdim:

The number of dimensions of x.

components:

A series of right singular vectors estimated.

var_correct:

Corrected population variance for Marchenko-Pastur distribution.

transpose_flag:

A logical value indicating whether the matrix x is transposed.

irl:

A data frame of scaled eigenvalues for specified rank and corresponding dimensions.

mp_irl:

A data frame of sampled expected eigenvalues from Marchenko-Pastur for specified rank and corresponding dimensions.

v:

Right singular vectors of x matrix for specified rank.

u:

Left singular vectors of x matrix or specified rank.

dimension:

Estimated signal subspace dimension.

bcp_irl:

Probability of change in mean and posterior means of eigenvalue difference between $x$ and $N$.

Details

We estimate the intrinsic dimension of a signal-rich subspace in large high-dimensional data by decomposing matrix into a signal-plus-noise space and approximate the signal-rich subspace with a rank K approximation \hat{x}=∑_{k=1}^{K}d_ku_k{v_k}^T. To estimate rank K, we propose a simple procedure assuming that matrix x is composed of a low-rank signal matrix S and an average general noise random matrix \bar{N}. It has been shown that the average eigenvalues of random matrices N follows a universal Marchenko-Pastur (MP) distribution. We hypothesize that the deviation of eigenvalues of x from the MP distribution indicates the intrinsic dimension of signal-rich subspace.

See Also

[RMTstat] for details of Marchenko-Pastur distribution.

https://dracodoc.wordpress.com/2014/07/21/ a-simple-algorithm-to-detect-flat-segments-in-noisy-signals/ for detection of flat and spike in noisy signals

Examples

x <- x_sim(n = 100, p = 150, ncc = 10, var = c(rep(10, 5), rep(1, 5)))
results <- dimension(x, components = 1:50)

#equivelantly, if subsapce is calcualted
Subspace <- subspace(x, components = 1:50)
results <- dimension(s = Subspace, method = "double_posterior")

str(results)
plot(results$subspace, changepoint = results$dimension,
     annotation = 10)
modified_legacyplot(results$bcp_irl, annotation = 10)

WenlanzZ/MKDim documentation built on July 30, 2022, 7:25 a.m.