computeStats: Compute Descriptive Statistics for Births, Deaths, Midpoints,...

View source: R/RcppExports.R

computeStatsR Documentation

Compute Descriptive Statistics for Births, Deaths, Midpoints, and Lifespans in a Persistence Diagram

Description

For a given persistence diagram D=\{(b_i,d_i)\}_{i=1}^N (corresponding to a specified homological dimension), computeStats() calculates descriptive statistics of the birth, death, midpoint (the average of birth and death), and lifespan (death minus birth) values. Additionally, it computes the total number of points and entropy of the lifespan values. Points in D with infinite death values are ignored.

Usage

computeStats(D, homDim)

Arguments

D

a persistence diagram: a matrix with three columns containing the homological dimension, birth and death values respectively.

homDim

the homological dimension (0 for H_0, 1 for H_1, etc.). Rows in D are filtered based on this value.

Details

The function extracts rows from D where the first column equals homDim, and computes the mean, standard deviation, median, IQR (interquartile range), range, 10th, 25th, 75th and 90th percentiles of the birth, death, midpoint, lifespan (or persistence) values; the total number of bars (or points in the diagram) and the entropy of the lifespan values (-\sum_{i=1}^N\frac{l_i}{L}\log_2(\frac{l_i}{L}), where l_i=d_i-b_i (lifespan) and L=\sum_{i=1}^N l_i). If D does not contain any points corresponding to homDim, a vector of zeros is returned.

Value

A (named) 38-dimensional numeric vector containing:

  • mean_births, stddev_births, median_births, iqr_births, range_births, p10_births, p25_births, p75_births, p90_births: Descriptive statistics for birth values.

  • mean_deaths, stddev_deaths, median_deaths, iqr_deaths, range_deaths, p10_deaths, p25_deaths, p75_deaths, p90_deaths: Descriptive statistics for death values.

  • mean_midpoints, stddev_midpoints, median_midpoints, iqr_midpoints, range_midpoints, p10_midpoints, p25_midpoints, p75_midpoints, p90_midpoints: Descriptive statistics for midpoint values (mean of birth and death values).

  • mean_lifespans, stddev_lifespans, median_lifespans, iqr_lifespans, range_lifespans, p10_lifespans, p25_lifespans, p75_lifespans, p90_lifespans: Descriptive statistics for lifespan (or persistence) values (difference between death and birth values).

  • total_bars: The total number of points in the specified homological dimension.

  • entropy: The entropy of the lifespan values.

Author(s)

Umar Islambekov

References

1. Ali, D., Asaad, A., Jimenez, M.J., Nanda, V., Paluzo-Hidalgo, E. and Soriano-Trigueros, M., (2023). A survey of vectorization methods in topological data analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence.

Examples

N <- 100 # The number of points to sample

set.seed(123) # Set a random seed for reproducibility

# Sample N points uniformly from the unit circle and add Gaussian noise
theta <- runif(N, min = 0, max = 2 * pi)
X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2)

# Compute the persistence diagram using the Rips filtration built on top of X
# The 'threshold' parameter specifies the maximum distance for building simplices
D <- TDAstats::calculate_homology(X, threshold = 2)

# Compute statistics for homological dimension H_0
computeStats(D, homDim = 0)

# Compute statistics for homological dimension H_1
computeStats(D, homDim = 1)

TDAvec documentation built on April 4, 2025, 1:37 a.m.