universal_null | R Documentation |
An inference procedure to determine which topological features (if any) of a datasets are likely signal (i.e. significant) vs noise (not).
universal_null(
X,
FUN_diag = "calculate_homology",
maxdim = 1,
thresh,
distance_mat = FALSE,
ripser = NULL,
ignore_infinite_cluster = TRUE,
calculate_representatives = FALSE,
alpha = 0.05,
return_pvals = FALSE,
infinite_cycle_inference = FALSE
)
X |
the input dataset, must either be a matrix or data frame. |
FUN_diag |
a string representing the persistent homology function to use for calculating the full persistence diagram, either 'calculate_homology' (the default), 'PyH' or 'ripsDiag'. |
maxdim |
the integer maximum homological dimension for persistent homology, default 0. |
thresh |
the positive numeric maximum radius of the Vietoris-Rips filtration. |
distance_mat |
a boolean representing if 'X' is a distance matrix (TRUE) or not (FALSE, default). dimensions together (TRUE, the default) or if one threshold should be calculated for each dimension separately (FALSE). |
ripser |
the imported ripser module when 'FUN_diag' is 'PyH'. |
ignore_infinite_cluster |
a boolean indicating whether or not to ignore the infinitely lived cluster when 'FUN_diag' is 'PyH'. If infinite cycle inference is to be performed, this parameter should be set to FALSE. |
calculate_representatives |
a boolean representing whether to calculate representative (co)cycles, default FALSE. Note that representatives cant be calculated when using the 'calculate_homology' function. Note that representatives cannot be computed for (significant) infinite cycles. |
alpha |
the type-1 error threshold, default 0.05. |
return_pvals |
a boolean representing whether or not to return p-values for features in the subsetted diagram as well as a list of p-value thresholds, default FALSE. Infinite cycles that are significant (see below) will have p-value NA in this list, as the true value is unknown but less than its dimension's p-value threshold. |
infinite_cycle_inference |
a boolean representing whether or not to perform inference for features with infinite (i.e. 'thresh') death values, default FALSE. If 'FUN_diag' is 'calculate_homology' (the default) then no infinite cycles will be returned by the persistent homology calculation at all. |
For each feature in a diagram we compute its persistence ratio \pi = death/birth
, and a
test statistic A log log \pi + B
(where A
and B
are constants). This statistic is compared to a left-skewed Gumbel distribution
to get a p-value. A Bonferroni correction is applied to all the p-values across all features, so when 'return_pvals' is TRUE a list of
p-value thresholds is also returned, one for each dimension, which is 'alpha' divided by the number of features in that dimension.
If desired, infinite cycles (i.e. cycles whose death value is equal to the maximum distance threshold parameter for the persistent homology calculation)
can be anaylzed for significance by determining their minimum distance thresholds where they might be significant (using the Gumbel distribution again),
calculating the persistence diagram up to those thresholds and seeing if they are still infinite (i.e. significant) or not.
This function is significantly faster than the bootstrap_persistence_thresholds
function. Note that the 'calculate_homology'
function does not seem to store infinite cycles (i.e. cycles that have death value equal to 'thresh').
a list containing the full persistence diagram, the subsetted diagram, representatives and/or subsetted representatives if desired, the p-values of subsetted features and the Bonferroni p-value thresholds in each dimension if desired.
Shael Brown - shaelebrown@gmail.com
Bobrowski O, Skraba P (2023). "A universal null-distribution for topological data analysis." https://www.nature.com/articles/s41598-023-37842-2.
if(require("TDA"))
{
# create dataset
theta <- runif(n = 100,min = 0,max = 2*pi)
x <- cos(theta)
y <- sin(theta)
circ <- data.frame(x = x,y = y)
# add noise
x_noise <- -0.1 + 0.2*stats::runif(n = 100)
y_noise <- -0.1 + 0.2*stats::runif(n = 100)
circ$x <- circ$x + x_noise
circ$y <- circ$y + y_noise
# determine significant topological features
library(TDA)
res <- universal_null(circ, thresh = 2,alpha = 0.1,return_pvals = TRUE,FUN_diag = "ripsDiag")
res$subsetted_diag
res$pvals
res$alpha_thresh
# at a lower threshold we can check for
# infinite cycles
res2 <- universal_null(circ, thresh = 1.1,
infinite_cycle_inference = TRUE,
alpha = 0.1,
FUN_diag = "ripsDiag")
res2$subsetted_diag
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.