knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
Sub-Image Analysis using Topological Summary Statistics.
The sub-image selection problem is to identify physical regions that most explain the variation between two classes of three dimensional shapes. SINATRA is a statistical pipeline for carrying out sub-image analyses using topological summary statistics. The algorithm follows four key steps:
Through detailed simulations, we assess the power of our algorithm as a function of its inputs. Lastly, as an application of our pipeline, we conduct feature selection on a dataset consisting of mandibular molars from different genera of New World Monkeys and examine the physical properties of their teeth that summarize their phylogeny.
Code for implementing the SINATRA pipeline was written in R (version 3.5.3). As part of this procedure:
Note that the package rgl
requires X11 or XQuartz on macOS systems.
To install the package, we will use the remotes package and run the command:
remotes::install_github('lcrawlab/SINATRA')
Next, to load the package, use the command:
library(sinatra)
Other common installation procedures may also apply.
Other details of our implementation choices for the SINATRA algorithm are provided below.
In the first step of the SINATRA pipeline, we use a tool from differential geometry called the Euler characteristic (EC) transform (Turner, Mukherjee, and Boyer 2014; Ghrist, Levanger, and Mai 2018; Crawford et al. 2020) to represent 3D shapes as a collection of vector-valued topological summary statistics. To do so, after picking a set of directions on which to measure the ECs of each shape in our data, the algorithm runs the function compute_standardized_ec_curve
.
If desired, resulting EC curve can then be transformed --- either smoothened or differentiated --- by using the function update_ec_curve
.
For each shape in the dataset, EC curves are computed in every direction and then concatenated into a p-dimensional topological feature vector. For a study with n-shapes, we analayze an n × p design matrix, where the columns denote the Euler characteristic computed at a given filtration step and direction combination.
Recall we are interested in identifying physical features that best differentiate two classes of shapes. For this purpose, we use Gaussian process classification model to assess the relationship between topological summary statistics and the variance between class labels. To perform variable selection on these statistics, we use relative centrality measures: a criterion which evaluates how much information about the shape classification is lost when a particular topological feature is missing from the model.
Keeping ECs with centrality measures above a certain threshold then determines a selected set worth further investigation.
After obtaining a select set of topological features, we map this information back onto the physical shape using the function compute_selected_vertices_cones
. This generates the sub-image on a given shape in our dataset.
Alternatively, the function reconstruct_vertices_on_shape
generates a heatmap on the shape which can be interpreted as visualizing the importance of each subset of Euler characteristics with respect to the class labels.
Implementation of the code may be best understood by viewing the examples in software/vignettes
. We provide tutorials for running the full SINATRA pipeline via the following cases:
Other code specific to analyses conducted in the paper can be found in the branch SINATRA/paper_results
.
For questions or concerns, please contact Bruce Wang, Timothy Sudijono, or Lorin Crawford. We appreciate any feedback you may have with our repository and instructions.
B. Wang, T. Sudijono, H. Kirveslahti*, T. Gao, D.M. Boyer, S. Mukherjee, and L. Crawford. SINATRA: a sub-image analysis pipeline for selecting features that differentiate classes of 3D shapes. Annals of Applied Statistics. In Press.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.