permutation_model_inference | R Documentation |
An inference procedure to determine if two datasets were unlikely to be generated by the same process (i.e. if the persistence diagram of one dataset is a good model of the persistence diagram of the other dataset).
permutation_model_inference(
D1,
D2,
iterations,
num_samples,
dims = c(0, 1),
samp = NULL,
paired = F,
num_workers = parallelly::availableCores(omit = 1),
verbose = F,
FUN_boot = "calculate_homology",
thresh,
distance_mat = FALSE,
ripser = NULL,
return_diagrams = FALSE
)
D1 |
the first dataset (a data frame). |
D2 |
the second dataset (a data frame). |
iterations |
the number of iterations for permuting group labels, default 20. |
num_samples |
the number of bootstrap iterations, default 30. |
dims |
a non-negative integer vector of the homological dimensions in which the test is to be carried out, default c(0,1). |
samp |
an optional list of row-number samples of 'D1', default NULL. See details and examples for more information. Ignored when 'paired' is FALSE. |
paired |
a boolean flag for if there is a second-order pairing between diagrams at the same index in different groups, default FALSE. |
num_workers |
the number of cores used for parallel computation, default is one less than the number of cores on the machine. |
verbose |
a boolean flag for if the time duration of the function call should be printed, default FALSE |
FUN_boot |
a string representing the persistent homology function to use for calculating the bootstrapped persistence diagrams, either 'calculate_homology' (the default), 'PyH' or 'ripsDiag'. |
thresh |
the positive numeric maximum radius of the Vietoris-Rips filtration. |
distance_mat |
a boolean representing if 'X' is a distance matrix (TRUE) or not (FALSE, default). dimensions together (TRUE, the default) or if one threshold should be calculated for each dimension separately (FALSE). |
ripser |
the imported ripser module when 'FUN_boot' is 'PyH'. |
return_diagrams |
whether or not to return the two lists of bootstrapped persistence diagrams, default FALSE. |
Inference is carried out by generating bootstrap resampled persistence diagrams from the two datasets and carrying out a permutation test on the resulting two groups. A small p-value in a certain dimension suggests that the datasets are not good models of each other. 'samp' should only be provided when 'paired'is TRUE in order to generate the same row samplings of 'D1' and 'D2' for the bootstrapped persistence diagrams. This makes a paired permutation test more appropriate, which has higher statistical power for detecting topological differences. See the examples for how to properly supply 'samp'.
a list which contains the output of the call to permutation_test
and the two groups of bootstrapped
persistence diagrams if desired, in entries called 'diagrams1' and 'diagrams2'.
Shael Brown - shaelebrown@gmail.com
Robinson T, Turner K (2017). "Hypothesis testing for topological data analysis." https://link.springer.com/article/10.1007/s41468-017-0008-7.
Chazal F et al (2017). "Robust Topological Inference: Distance to a Measure and Kernel Distance." https://www.jmlr.org/papers/volume18/15-484/15-484.pdf.
Abdallah H et al. (2021). "Statistical Inference for Persistent Homology applied to fMRI." https://github.com/hassan-abdallah/Statistical_Inference_PH_fMRI/blob/main/Abdallah_et_al_Statistical_Inference_PH_fMRI.pdf.
permutation_test
for an inferential group difference test for groups of persistence diagrams and bootstrap_persistence_thresholds
for computing confidence sets for persistence diagrams.
if(require("TDAstats"))
{
# create two datasets
D1 <- TDAstats::calculate_homology(TDAstats::circle2d[sample(1:100,10),],
dim = 0,threshold = 2)
D2 <- TDAstats::calculate_homology(TDAstats::circle2d[sample(1:100,10),],
dim = 0,threshold = 2)
# do model inference test with 1 iteration (for speed, more
# iterations should be used in practice)
model_test <- permutation_model_inference(D1, D2, iterations = 1,
thresh = 1.75,num_samples = 3,
num_workers = 2L)
# with more iterations, p-values show a difference in the
# clustering of points but not in the arrangement of loops
model_test$p_values
# to supply samp, when we believe there is a correspondence between
# the rows in D1 and the rows in D2
# note that the number of entries of samp (3 in this case) must
# match the num_samples parameter to the function call
samp <- lapply(X = 1:3,FUN = function(X){
return(unique(sample(1:nrow(D1),size = nrow(D1),replace = TRUE)))
})
# model inference will theoretically have higher power now for a
# paired test
model_test2 <- permutation_model_inference(D1, D2, iterations = 1,
thresh = 1.75,num_samples = 3,
paired = TRUE,samp = samp,
num_workers = 2L)
model_test2$p_values
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.