# bootstrapDiagram: Bootstrapped Confidence Set for a Persistence Diagram, using... In TDA: Statistical Tools for Topological Data Analysis

 bootstrapDiagram R Documentation

## Bootstrapped Confidence Set for a Persistence Diagram, using the Bottleneck Distance (or the Wasserstein distance).

### Description

The function `bootstrapDiagram` computes a `(1-alpha)` confidence set for the Persistence Diagram of a filtration of sublevel sets (or superlevel sets) of a function evaluated over a grid of points. The function returns the (`1-alpha`) quantile of `B` bottleneck distances (or Wasserstein distances), computed in `B` iterations of the bootstrap algorithm.

### Usage

```bootstrapDiagram(
X, FUN, lim, by, maxdimension = length(lim) / 2 - 1,
sublevel = TRUE, library = "GUDHI", B = 30, alpha = 0.05,
distance = "bottleneck", dimension = min(1, maxdimension),
p = 1, parallel = FALSE, printProgress = FALSE, weight = NULL,
...)
```

### Arguments

 `X` an n by d matrix of coordinates, used by the function `FUN`, where n is the number of points stored in `X` and d is the dimension of the space. `FUN` a function whose inputs are 1) an n by d matrix of coordinates `X`, 2) an m by d matrix of coordinates `Grid`, 3) an optional smoothing parameter, and returns a numeric vector of length m. For example see `distFct`, `kde`, and `dtm` which compute the distance function, the kernel density estimator and the distance to measure, over a grid of points using the input `X`. Note that `Grid` is not an input of `bootstrapDiagram`, but is automatically computed by the function using `lim` and `by`. `lim` a 2 by d matrix, where each column specifies the range of each dimension of the grid, over which the function `FUN` is evaluated. `by` either a number or a vector of length d specifying space between points of the grid in each dimension. If a number is given, then same space is used in each dimension. `maxdimension` a number that indicates the maximum dimension to compute persistent homology to. The default value is d - 1, which is (dimension of embedding space - 1). `sublevel` a logical variable indicating if the Persistence Diagram should be computed for sublevel sets (`TRUE`) or superlevel sets (`FALSE`) of the function. The default value is `TRUE`. `library` a string specifying which library to compute the persistence diagram. The user can choose either the library `"GUDHI"`, `"Dionysus"`, or `"PHAT"`. The default value is `"GUDHI"`. `B` the number of bootstrap iterations. The default value is `30`. `alpha` The function `bootstrapDiagram` returns a (`1 - alpha`) quantile. The default value is `0.05`. `distance` a string specifying the distance to be used for persistence diagrams: either `"bottleneck"` or `"wasserstein"`. The default value is `"bottleneck"`. `dimension` `dimension` is an integer or a vector specifying the dimension of the features used to compute the bottleneck distance. `0` for connected components, `1` for loops, `2` for voids, and so on. The default value is `1` if maxdimension ≥ 1, and else `0`. `p` if `distance == "wasserstein"`, then `p` is an integer specifying the power to be used in the computation of the Wasserstein distance. The default value is `1`. `parallel` logical: if `TRUE` the bootstrap iterations are parallelized, using the library `parallel`. The default value is `FALSE`. `printProgress` if `TRUE` a progress bar is printed. The default value is `FALSE`. `weight` either NULL, a number, or a vector of length n. If it is NULL, weight is not used. If it is a number, then same weight is applied to each points of `X`. If it is a vector, `weight` represents weights of each points of `X`. The default value is `NULL`. `...` additional parameters for the function `FUN`.

### Details

The function `bootstrapDiagram` uses `gridDiag` to compute the persistence diagram of the input function using the entire sample. Then the bootstrap algorithm, for `B` times, computes the bottleneck distance between the original persistence diagram and the one computed using a subsample. Finally the (`1-alpha`) quantile of these `B` values is returned. See (Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman, 2014) for discussion of the method.

### Value

The function `bootstrapDiagram` returns the (`1-alpha`) quantile of the values computed by the bootstrap algorithm.

### Note

The function `bootstrapDiagram` uses the C++ library Dionysus for the computation of bottleneck and Wasserstein distances. See references.

### Author(s)

Jisu Kim and Fabrizio Lecci

### References

Chazal F, Fasy BT, Lecci F, Michel B, Rinaldo A, Wasserman L (2014). "Robust Topological Inference: Distance-To-a-Measure and Kernel Distance." Technical Report.

Wasserman L (2004), "All of statistics: a concise course in statistical inference." Springer.

Morozov D (2007). "Dionysus, a C++ library for computing persistent homology." https://www.mrzv.org/software/dionysus/

`bottleneck`, `bootstrapBand`, `distFct`, `kde`, `kernelDist`, `dtm`, `summary.diagram`, `plot.diagram`

### Examples

```## confidence set for the Kernel Density Diagram

# input data
n <- 400
XX <- circleUnif(n)

## Ranges of the grid
Xlim <- c(-1.8, 1.8)
Ylim <- c(-1.6, 1.6)
lim <- cbind(Xlim, Ylim)
by <- 0.05

h <- .3  #bandwidth for the function kde

#Kernel Density Diagram of the superlevel sets
Diag <- gridDiag(XX, kde, lim = lim, by = by, sublevel = FALSE,
printProgress = TRUE, h = h)

# confidence set
B <- 10       ## the number of bootstrap iterations should be higher!
## this is just an example
alpha <- 0.05

cc <- bootstrapDiagram(XX, kde, lim = lim, by = by, sublevel = FALSE, B = B,
alpha = alpha, dimension = 1, printProgress = TRUE, h = h)

plot(Diag[["diagram"]], band = 2 * cc)
```

TDA documentation built on March 30, 2022, 1:06 a.m.