# bootstrapDiagram: Bootstrapped Confidence Set for a Persistence Diagram, using... In TDA: Statistical Tools for Topological Data Analysis

## Description

The function `bootstrapDiagram` computes a `(1-alpha)` confidence set for the Persistence Diagram of a filtration of sublevel sets (or superlevel sets) of a function evaluated over a grid of points. The function returns the (`1-alpha`) quantile of `B` bottleneck distances (or Wasserstein distances), computed in `B` iterations of the bootstrap algorithm.

## Usage

 ```1 2 3 4 5 6``` ```bootstrapDiagram( X, FUN, lim, by, maxdimension = length(lim) / 2 - 1, sublevel = TRUE, library = "GUDHI", B = 30, alpha = 0.05, distance = "bottleneck", dimension = min(1, maxdimension), p = 1, parallel = FALSE, printProgress = FALSE, weight = NULL, ...) ```

## Arguments

 `X` an n by d matrix of coordinates, used by the function `FUN`, where n is the number of points stored in `X` and d is the dimension of the space. `FUN` a function whose inputs are 1) an n by d matrix of coordinates `X`, 2) an m by d matrix of coordinates `Grid`, 3) an optional smoothing parameter, and returns a numeric vector of length m. For example see `distFct`, `kde`, and `dtm` which compute the distance function, the kernel density estimator and the distance to measure, over a grid of points using the input `X`. Note that `Grid` is not an input of `bootstrapDiagram`, but is automatically computed by the function using `lim` and `by`. `lim` a 2 by d matrix, where each column specifies the range of each dimension of the grid, over which the function `FUN` is evaluated. `by` either a number or a vector of length d specifying space between points of the grid in each dimension. If a number is given, then same space is used in each dimension. `maxdimension` a number that indicates the maximum dimension to compute persistent homology to. The default value is d - 1, which is (dimension of embedding space - 1). `sublevel` a logical variable indicating if the Persistence Diagram should be computed for sublevel sets (`TRUE`) or superlevel sets (`FALSE`) of the function. The default value is `TRUE`. `library` a string specifying which library to compute the persistence diagram. The user can choose either the library `"GUDHI"`, `"Dionysus"`, or `"PHAT"`. The default value is `"GUDHI"`. `B` the number of bootstrap iterations. The default value is `30`. `alpha` The function `bootstrapDiagram` returns a (`1 - alpha`) quantile. The default value is `0.05`. `distance` a string specifying the distance to be used for persistence diagrams: either `"bottleneck"` or `"wasserstein"`. The default value is `"bottleneck"`. `dimension` `dimension` is an integer or a vector specifying the dimension of the features used to compute the bottleneck distance. `0` for connected components, `1` for loops, `2` for voids, and so on. The default value is `1` if maxdimension ≥ 1, and else `0`. `p` if `distance == "wasserstein"`, then `p` is an integer specifying the power to be used in the computation of the Wasserstein distance. The default value is `1`. `parallel` logical: if `TRUE` the bootstrap iterations are parallelized, using the library `parallel`. The default value is `FALSE`. `printProgress` if `TRUE` a progress bar is printed. The default value is `FALSE`. `weight` either NULL, a number, or a vector of length n. If it is NULL, weight is not used. If it is a number, then same weight is applied to each points of `X`. If it is a vector, `weight` represents weights of each points of `X`. The default value is `NULL`. `...` additional parameters for the function `FUN`.

## Details

The function `bootstrapDiagram` uses `gridDiag` to compute the persistence diagram of the input function using the entire sample. Then the bootstrap algorithm, for `B` times, computes the bottleneck distance between the original persistence diagram and the one computed using a subsample. Finally the (`1-alpha`) quantile of these `B` values is returned. See (Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman, 2014) for discussion of the method.

## Value

The function `bootstrapDiagram` returns the (`1-alpha`) quantile of the values computed by the bootstrap algorithm.

## Note

The function `bootstrapDiagram` uses the C++ library Dionysus for the computation of bottleneck and Wasserstein distances. See references.

## Author(s)

Jisu Kim and Fabrizio Lecci

## References

Chazal F, Fasy BT, Lecci F, Michel B, Rinaldo A, Wasserman L (2014). "Robust Topological Inference: Distance-To-a-Measure and Kernel Distance." Technical Report.

Wasserman L (2004), "All of statistics: a concise course in statistical inference." Springer.

Morozov D (2007). "Dionysus, a C++ library for computing persistent homology." https://www.mrzv.org/software/dionysus/

`bottleneck`, `bootstrapBand`, `distFct`, `kde`, `kernelDist`, `dtm`, `summary.diagram`, `plot.diagram`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27``` ```## confidence set for the Kernel Density Diagram # input data n <- 400 XX <- circleUnif(n) ## Ranges of the grid Xlim <- c(-1.8, 1.8) Ylim <- c(-1.6, 1.6) lim <- cbind(Xlim, Ylim) by <- 0.05 h <- .3 #bandwidth for the function kde #Kernel Density Diagram of the superlevel sets Diag <- gridDiag(XX, kde, lim = lim, by = by, sublevel = FALSE, printProgress = TRUE, h = h) # confidence set B <- 10 ## the number of bootstrap iterations should be higher! ## this is just an example alpha <- 0.05 cc <- bootstrapDiagram(XX, kde, lim = lim, by = by, sublevel = FALSE, B = B, alpha = alpha, dimension = 1, printProgress = TRUE, h = h) plot(Diag[["diagram"]], band = 2 * cc) ```