View source: R/quality_control.R
tof_assess_clusters_knn | R Documentation |
This function evaluates the result of a clustering procedure by finding the cell's K nearest neighbors, determining which cluster the majority of them are assigned to, and checking if this matches the cell's own cluster assignment. If the cluster assignment of the majority of a cell's nearest neighbors does not match with the cell's own cluster assignment, the cell is flagged as potentially anomalous.
tof_assess_clusters_knn(
tof_tibble,
cluster_col,
marker_cols = where(tof_is_numeric),
num_neighbors = min(10, nrow(tof_tibble)),
distance_function = c("euclidean", "cosine", "l2", "ip"),
augment = FALSE
)
tof_tibble |
A 'tof_tbl' or 'tibble'. |
cluster_col |
An unquoted column name indicating which column in 'tof_tibble' stores the cluster ids for the cluster to which each cell belongs. Cluster labels can be produced via any method the user chooses - including manual gating, any of the functions in the 'tof_cluster_*' function family, or any other method. |
marker_cols |
Unquoted column names indicating which column in 'tof_tibble' should be interpreted as markers to be used in the mahalanobis distance calculation. Defaults to all numeric columns. Supports tidyselection. |
num_neighbors |
An integer indicating how many neighbors should be found during the nearest neighbor calculation. |
distance_function |
A string indicating which distance function should be used to perform the k nearest neighbor calculation. Options are "euclidean" (the default) and "cosine". |
augment |
A boolean value indicating if the output should column-bind the computed flags for each cell (see below) as new columns in 'tof_tibble' (TRUE) or if a tibble including only the computed flags should be returned (FALSE, the default). |
If augment = FALSE (the default), a tibble with 2 columns: ".knn_cluster" (a character vector indicating which cluster received the majority vote of each cell's k nearest neighbors) and "flagged_cell" (a boolean value indicating if the cell's cluster assignment matched the majority vote (TRUE) or not (FALSE)). If augment = TRUE, the same 2 columns will be column-bound to tof_tibble, and the resulting tibble will be returned.
sim_data <-
dplyr::tibble(
cd45 = c(rnorm(n = 1000, sd = 1.5), rnorm(n = 1000, mean = 2), rnorm(n = 1000, mean = -2)),
cd38 = c(rnorm(n = 1000, sd = 1.5), rnorm(n = 1000, mean = 2), rnorm(n = 1000, mean = -2)),
cd34 = c(rnorm(n = 1000, sd = 1.5), rnorm(n = 1000, mean = 2), rnorm(n = 1000, mean = -2)),
cd19 = c(rnorm(n = 1000, sd = 1.5), rnorm(n = 1000, mean = 2), rnorm(n = 1000, mean = -2)),
cluster_id = c(rep("a", 1000), rep("b", 1000), rep("c", 1000))
)
knn_result <-
sim_data |>
tof_assess_clusters_knn(
cluster_col = cluster_id,
num_neighbors = 10
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.