tof_assess_clusters_entropy: Assess a clustering result by calculating the shannon entropy...
In keyes-timothy/tidytof: Analyze High-dimensional Cytometry Data Using Tidy Data Principles

tof_assess_clusters_entropy

R Documentation

Assess a clustering result by calculating the shannon entropy of each cell's mahalanobis distance to all cluster centroids and flagging outliers.

Description

This function evaluates the result of a clustering procedure by calculating the mahalanobis distance between each cell and the centroids of all clusters in the dataset and finding the shannon entropy of the resulting vector of distances. All cells with an entropy threshold above a user-specified threshold are flagged as potentially anomalous. Entropy is minimized (to 0) when a cell is close to one (or a small number) of clusters, but far from the rest of them. If a cell is close to multiple cluster centroids (i.e. has an ambiguous phenotype), its entropy will be large.

Usage

tof_assess_clusters_entropy(
  tof_tibble,
  cluster_col,
  marker_cols = where(tof_is_numeric),
  entropy_threshold,
  entropy_quantile = 0.9,
  num_closest_clusters,
  augment = FALSE
)

Arguments

`tof_tibble`	A 'tof_tbl' or 'tibble'.
`cluster_col`	An unquoted column name indicating which column in 'tof_tibble' stores the cluster ids for the cluster to which each cell belongs. Cluster labels can be produced via any method the user chooses - including manual gating, any of the functions in the 'tof_cluster_*' function family, or any other method.
`marker_cols`	Unquoted column names indicating which column in 'tof_tibble' should be interpreted as markers to be used in the mahalanobis distance calculation. Defaults to all numeric columns. Supports tidyselection.
`entropy_threshold`	A scalar indicating the entropy threshold above which a cell should be considered anomalous. If unspecified, a threshold will be computed using 'entropy_quantile' (see below). (Note: Entropy is often between 0 and 1, but can be larger with many classes/clusters).
`entropy_quantile`	A scalar between 0 and 1 indicating the entropy quantile above which a cell should be considered anomalous. Defaults to 0.9, which means that cells with an entropy above the 90th percentile will be flagged. Ignored if entropy_threshold is specified directly.
`num_closest_clusters`	An integer indicating how many of a cell's closest cluster centroids should have their mahalanobis distance included in the entropy calculation. Playing with this argument will allow you to ignore distances to clusters that are far away from each cell (and thus may distort the result, as many distant centroids with large distances can artificially inflate a cells' entropy value; that being said, this is rarely an issue empirically). Defaults to all clusters in tof_tibble.
`augment`	A boolean value indicating if the output should column-bind the computed flags for each cell (see below) as new columns in 'tof_tibble' (TRUE) or if a tibble including only the computed flags should be returned (FALSE, the default).

Value

If augment = FALSE (the default), a tibble with 2 + NUM_CLUSTERS columns. where NUM_CLUSTERS is the number of unique clusters in cluster_col. Two of the columns will be "entropy" (the entropy value for each cell) and "flagged_cell" (a boolean value indicating if each cell had an entropy value above entropy_threshold). The other NUM_CLUSTERS columns will contain the mahalanobis distances from each cell to each of the clusters in cluster_col (named ".mahalanobis_{cluster_name}"). If augment = TRUE, the same 2 + NUM_CLUSTERS columns will be column-bound to tof_tibble, and the resulting tibble will be returned.

Examples


# simulate data
sim_data <-
    dplyr::tibble(
        cd45 = c(rnorm(n = 1000, sd = 1.5), rnorm(n = 1000, mean = 2), rnorm(n = 1000, mean = -2)),
        cd38 = c(rnorm(n = 1000, sd = 1.5), rnorm(n = 1000, mean = 2), rnorm(n = 1000, mean = -2)),
        cd34 = c(rnorm(n = 1000, sd = 1.5), rnorm(n = 1000, mean = 2), rnorm(n = 1000, mean = -2)),
        cd19 = c(rnorm(n = 1000, sd = 1.5), rnorm(n = 1000, mean = 2), rnorm(n = 1000, mean = -2)),
        cluster_id = c(rep("a", 1000), rep("b", 1000), rep("c", 1000))
    )

# imagine a "reference" dataset in which "cluster a" isn't present
sim_data_reference <-
    sim_data |>
    dplyr::filter(cluster_id %in% c("b", "c"))

# if we cluster into the reference dataset, we will force all cells in
# cluster a into a population where they don't fit very well
sim_data <-
    sim_data |>
    tof_cluster(
        healthy_tibble = sim_data_reference,
        healthy_label_col = cluster_id,
        method = "ddpr"
    )

# we can evaluate the clustering quality by calculating by the entropy of the
# mahalanobis distance vector for each cell to all cluster centroids
entropy_result <-
    sim_data |>
    tof_assess_clusters_entropy(
        cluster_col = .mahalanobis_cluster,
        marker_cols = starts_with("cd"),
        entropy_quantile = 0.8,
        augment = TRUE
    )

# most cells in "cluster a" are flagged, and few cells in the other clusters are
flagged_cluster_proportions <-
    entropy_result |>
    dplyr::group_by(cluster_id) |>
    dplyr::summarize(
        prop_flagged = mean(flagged_cell)
    )

keyes-timothy/tidytof documentation built on Aug. 28, 2024, 8:37 a.m.

keyes-timothy/tidytof index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

keyes-timothy/tidytof
Analyze High-dimensional Cytometry Data Using Tidy Data Principles

tof_assess_clusters_entropy: Assess a clustering result by calculating the shannon entropy...
In keyes-timothy/tidytof: Analyze High-dimensional Cytometry Data Using Tidy Data Principles

Assess a clustering result by calculating the shannon entropy of each cell's mahalanobis distance to all cluster centroids and flagging outliers.

Description

Usage

Arguments

Value

Examples

Related to tof_assess_clusters_entropy in keyes-timothy/tidytof...

R Package Documentation

Browse R Packages

We want your feedback!

keyes-timothy/tidytof Analyze High-dimensional Cytometry Data Using Tidy Data Principles

tof_assess_clusters_entropy: Assess a clustering result by calculating the shannon entropy... In keyes-timothy/tidytof: Analyze High-dimensional Cytometry Data Using Tidy Data Principles

Assess a clustering result by calculating the shannon entropy of each cell's mahalanobis distance to all cluster centroids and flagging outliers.

Description

Usage

Arguments

Value

Examples

Related to tof_assess_clusters_entropy in keyes-timothy/tidytof...

R Package Documentation

Browse R Packages

We want your feedback!

keyes-timothy/tidytof
Analyze High-dimensional Cytometry Data Using Tidy Data Principles

tof_assess_clusters_entropy: Assess a clustering result by calculating the shannon entropy...
In keyes-timothy/tidytof: Analyze High-dimensional Cytometry Data Using Tidy Data Principles