computePersistenceBlock: A Vector Summary of the Persistence Block

View source: R/RcppExports.R

computePersistenceBlockR Documentation

A Vector Summary of the Persistence Block

Description

For a given persistence diagram D=\{(b_i,p_i)\}_{i=1}^N (corresponding to a specified homological dimension), computePersistenceBlock() vectorizes the persistence block

f(x,y)=\sum_{i=1}^N \bold 1_{E(b_i,p_i)}(x,y),

where E(b_i,p_i)=[b_i-\frac{\lambda_i}{2},b_i+\frac{\lambda_i}{2}]\times [p_i-\frac{\lambda_i}{2},p_i+\frac{\lambda_i}{2}] and \lambda_i=2\tau p_i with \tau\in (0,1]. Points in D with infinite persistence values are ignored.

Usage

computePersistenceBlock(D, homDim, xSeq, ySeq, tau=0.3)

Arguments

D

a persistence diagram: a matrix with three columns containing the homological dimension, birth and persistence values respectively.

homDim

the homological dimension (0 for H_0, 1 for H_1, etc.). Rows in D are filtered based on this value.

xSeq

a numeric vector of increasing x (birth) values used for vectorization.

ySeq

a numeric vector of increasing y (persistence) values used for vectorization.

tau

a parameter (between 0 and 1) controlling block sizes. Default is tau=0.3.

Details

The function extracts rows from D where the first column equals homDim, and computes values based on the filtered data, xSeq and ySeq. If D does not contain any points corresponding to homDim, a vector of zeros is returned.

Value

A numeric vector whose elements are the weighted averages of the persistence block computed over each cell of the two-dimensional grid constructred from xSeq=\{x_1,x_2,\ldots,x_n\} and ySeq=\{y_1,y_2,\ldots,y_m\}:

\Big(\frac{1}{\Delta x_1\Delta y_1}\int_{[x_1,x_2]\times [y_1,y_2]}f(x,y)wdA,\ldots,\frac{1}{\Delta x_{n-1}\Delta y_{m-1}}\int_{[x_{n-1},x_n]\times [y_{m-1},y_m]}f(x,y)wdA\Big)\in\mathbb{R}^{d},

where d=(n-1)(m-1), wdA=(x+y)dxdy, \Delta x_k=x_{k+1}-x_k and \Delta y_j=y_{j+1}-y_j.

If homDim=0 and all the birth values are equal (e.g., zero), univariate persistence block functions are used instead for vectorization:

\Big(\frac{1}{\Delta y_1}\int_{y_1}^{y_2}f(y)ydy,\ldots,\frac{1}{\Delta y_{m-1}}\int_{y_{m-1}}^{y_m}f(y)ydy\Big)\in\mathbb{R}^{m-1},

where f(y)=\sum_{i=1}^N \bold 1_{[p_i-\frac{\lambda_i}{2},p_i+\frac{\lambda_i}{2}]}(y) and \Delta y_j=y_{j+1}-y_j.

Author(s)

Umar Islambekov, Aleksei Luchinsky

References

1. Chan, K. C., Islambekov, U., Luchinsky, A., & Sanders, R. (2022). A computationally efficient framework for vector representation of persistence diagrams. Journal of Machine Learning Research 23, 1-33.

Examples

N <- 100 # The number of points to sample
set.seed(123) # Set a random seed for reproducibility

# Sample N points uniformly from the unit circle and add Gaussian noise
theta <- runif(N, min = 0, max = 2 * pi)
X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2)

# Compute the persistence diagram using the Rips filtration built on top of X
# The 'threshold' parameter specifies the maximum distance for building simplices
D <- TDAstats::calculate_homology(X, threshold = 2)

# Switch from the birth-death to the birth-persistence coordinates
D[,3] <- D[,3] - D[,2]
colnames(D)[3] <- "Persistence"

# Construct one-dimensional grid of scale values
ySeqH0 <- unique(quantile(D[D[,1] == 0, 3], probs = seq(0, 1, by = 0.2)))
tau <- 0.3 # Parameter in [0,1] which controls the size of blocks around each point of the diagram

# Compute a vector summary of the persistence block for homological dimension H_0
computePersistenceBlock(D, homDim = 0, xSeq = NA, ySeq = ySeqH0, tau = tau)

xSeqH1 <- unique(quantile(D[D[,1] == 1, 2], probs = seq(0, 1, by = 0.2)))
ySeqH1 <- unique(quantile(D[D[,1] == 1, 3], probs = seq(0, 1, by = 0.2)))

# Compute a vector summary of the persistence block for homological dimension H_1
computePersistenceBlock(D, homDim = 1, xSeq = xSeqH1, ySeq = ySeqH1, tau = tau)


TDAvec documentation built on April 4, 2025, 1:37 a.m.