In niaid/HDStIM: High Dimensional Stimulation Immune Mapping ('HDStIM')

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

The primary function HDStIM() in the HDStIM package follows a heuristic approach to group cells into responding and non-responding. For a combination of cell population and stimulation type (e.g., CD127+ T-helper cells and interferon-alpha), HDStIM() starts by performing K-means clustering on the combined set of cells from stimulated and unstimulated samples. K-means clustering is performed on combined expression data of all the state (signaling/intracellular) markers. Upon clustering using a contingency table as shown below, a Fisher's exact test determines the effect size and the statistical significance of partitioning. Cells from the combinations that pass the Fisher's exact test (p-value < 0.05) are considered responding. An optional UMAP can also be calculated to visually verify the cell partitioning in responding and non-responding groups by using auxiliary plotting scripts provided in the package.

In addition to an auxiliary script to plot UMAPs, the package also comes with two other plotting scripts for K-means clustering and Fisher’s exact test and state marker density before and after mapping.

An example of the contingency table used for Fisher's exact test.

matrix(c(60, 40, 20, 80),nrow = 2, ncol = 2, 
       dimnames = list(c("Cluster1", "Cluster2"), c("Stim", "Unstim")))

To Run The Main HDStIM Function

As stated above, HDStIM() is the primary function of the HDStIM package. We will use the example data set chi11 (from mass cytometry) included in the package.

Note:chi11 is a minimal dataset included for unit testing only. Therefore, it does not represent a typical mass/flow cytometry assay.

library(HDStIM)

mapped_data <-  HDStIM(chi11$expr_data, chi11$state_markers,
                       chi11$cluster_col, chi11$stim_label,
                       chi11$unstim_label, seed_val = 123, 
                       umap = TRUE, umap_cells = 500, 
                       verbose = FALSE)

class(mapped_data)

attributes(mapped_data)

Output

HDStIM() returns a list with the mapped expression data, data to plot stacked bar plots to visualize the K-means and Fisher's exact test results, and data to plot the optional UMAPs. The list also includes tables containing statistical information from K-means and Fisher's exact test and other information passed as the function attributes.

head(mapped_data$response_mapping_main)

head(mapped_data$stacked_bar_plot_data)

head(mapped_data$umap_plot_data)

head(mapped_data$all_fisher_p_val)

head(mapped_data$all_k_means_data)

To Plot Diagnostic Figures

Plots Explaining K-means Clustering And Fisher's Exact Test

Using the stacked_bar_plot_data, plot_K_Fisher() generates bar plots showing the percentage of cells from the stimulated and unstimulated samples clustered in the two K-means clusters a given cell population and stimulation type.

plot_K_Fisher() returns a list of ggplot objects. If the path is specified, it can also render and save the plots in PNG format.

k_plots <- plot_K_Fisher(mapped_data, path = NULL, verbose = FALSE)
k_plots[[1]]

UMAP Plots To Visually Inspect Responding and Non-Responding Cell Mapping

Note: You can only generate these plots if you have asked UMAPs to be calculated in the HDStIM() function.

UMAP plots can be helpful for visually inspecting how well HDStIM() has mapped responding vs. non-responding cells for a cell population and stimulation type. plot_umap() also returns a list of ggplot objects and if the path is specified, it will render and save the plots in PNG format.

u_plots <- plot_umap(mapped_data, path = NULL, verbose = FALSE)
u_plots[[1]]

Distribution Plots for Individual State Marker before And After Mapping

For each state/signaling markers distribution plots shows the kernel density estimation of the pre HDStIM() data from both stimulated and unstimulated samples along with the density from cells from stimulated samples mapped as responding. plot_exprs() also returns a list of ggplot objects and if the path is specified, it will render and save the plots in PNG format.

e_plots <- plot_exprs(mapped_data, path = NULL,verbose = FALSE)
library(ggplot2)
e_plots[[1]] +
    theme(text = element_text(size = 11))

To Rank State/Signaling Markers According To Their Contribution To The Response

marker_ranking_boruta() function runs Boruta on the stimulation - cell population combinations that passed the Fisher's exact test to rank the markers according to their contribution to the response. The function returns a list with a tibble containing attribute statistics calculated by Boruta and ggplot objects. If the path is not NULL, plots are also rendered and saved in the specified folder in PNG format.

m_ranks <- marker_ranking_boruta(mapped_data, path = NULL, n_cells = NULL,
                                 max_runs = 100, seed_val = 123,
                                 verbose = FALSE)

head(m_ranks$attribute_stats)

m_ranks$plots[[1]]