tof_upsample_distance: Upsample cells into the closest cluster in a reference...
In keyes-timothy/tidytof: Analyze High-dimensional Cytometry Data Using Tidy Data Principles

tof_upsample_distance

R Documentation

Upsample cells into the closest cluster in a reference dataset

Description

This function performs distance-based upsampling on CyTOF data by sorting single cells (passed into the function as 'tof_tibble') into their most phenotypically similar cell subpopulation in a reference dataset (passed into the function as 'reference_tibble'). It does so by calculating the distance (either mahalanobis, cosine, or pearson) between each cell in 'tof_tibble' and the centroid of each cluster in 'reference_tibble', then sorting cells into the cluster corresponding to their closest centroid.

Usage

tof_upsample_distance(
  tof_tibble,
  reference_tibble,
  reference_cluster_col,
  upsample_cols = where(tof_is_numeric),
  parallel_cols,
  distance_function = c("mahalanobis", "cosine", "pearson"),
  num_cores = 1L,
  return_distances = FALSE
)

Arguments

`tof_tibble`	A 'tibble' or 'tof_tbl' containing cells to be upsampled into their nearest reference subpopulation.
`reference_tibble`	A 'tibble' or 'tof_tibble' containing cells that have already been clustered or manually gated into subpopulations.
`reference_cluster_col`	An unquoted column name indicating which column in 'reference_tibble' contains the subpopulation label (or cluster id) for each cell in 'reference_tibble'.
`upsample_cols`	Unquoted column names indicating which columns in 'tof_tibble' to use in computing the distances used for upsampling. Defaults to all numeric columns in 'tof_tibble'. Supports tidyselect helpers.
`parallel_cols`	Optional. Unquoted column names indicating which columns in 'tof_tibble' to use for breaking up the data in order to parallelize the upsampling using 'foreach' on a 'doParallel' backend. Supports tidyselect helpers.
`distance_function`	A string indicating which distance function should be used to perform the upsampling. Options are "mahalanobis" (the default), "cosine", and "pearson".
`num_cores`	An integer indicating the number of CPU cores used to parallelize the classification. Defaults to 1 (a single core).
`return_distances`	A boolean value indicating whether or not the returned result should include only one column, the cluster ids corresponding to each row of 'tof_tibble' (return_distances = FALSE, the default), or if the returned result should include additional columns representing the distance between each row of 'tof_tibble' and each of the reference subpopulation centroids (return_distances = TRUE).

Value

If 'return_distances = FALSE', a tibble with one column named '.upsample_cluster', a character vector of length 'nrow(tof_tibble)' indicating the id of the reference cluster to which each cell (i.e. each row) in 'tof_tibble' was assigned.

If 'return_distances = TRUE', a tibble with 'nrow(tof_tibble)' rows and num_clusters + 1 columns, where num_clusters is the number of clusters in 'reference_tibble'. Each row represents a cell from 'tof_tibble', and num_clusters of the columns represent the distance between the cell and each of the reference subpopulations' cluster centroids. The final column represents the cluster id of the reference subpopulation with the minimum distance to the cell represented by that row.

Examples

# simulate single-cell data (and reference data with clusters to upsample
# into
sim_data <-
    dplyr::tibble(
        cd45 = rnorm(n = 1000),
        cd38 = rnorm(n = 1000),
        cd34 = rnorm(n = 1000),
        cd19 = rnorm(n = 1000)
    )

reference_data <-
    dplyr::tibble(
        cd45 = rnorm(n = 200),
        cd38 = rnorm(n = 200),
        cd34 = rnorm(n = 200),
        cd19 = rnorm(n = 200),
        cluster_id = c(rep("a", times = 100), rep("b", times = 100))
    )

# upsample using mahalanobis distance
tof_upsample_distance(
    tof_tibble = sim_data,
    reference_tibble = reference_data,
    reference_cluster_col = cluster_id
)

# upsample using cosine distance
tof_upsample_distance(
    tof_tibble = sim_data,
    reference_tibble = reference_data,
    reference_cluster_col = cluster_id,
    distance_function = "cosine"
)

keyes-timothy/tidytof documentation built on Aug. 28, 2024, 8:37 a.m.

keyes-timothy/tidytof index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

keyes-timothy/tidytof
Analyze High-dimensional Cytometry Data Using Tidy Data Principles

tof_upsample_distance: Upsample cells into the closest cluster in a reference...
In keyes-timothy/tidytof: Analyze High-dimensional Cytometry Data Using Tidy Data Principles

Upsample cells into the closest cluster in a reference dataset

Description

Usage

Arguments

Value

Examples

Related to tof_upsample_distance in keyes-timothy/tidytof...

R Package Documentation

Browse R Packages

We want your feedback!

keyes-timothy/tidytof Analyze High-dimensional Cytometry Data Using Tidy Data Principles

tof_upsample_distance: Upsample cells into the closest cluster in a reference... In keyes-timothy/tidytof: Analyze High-dimensional Cytometry Data Using Tidy Data Principles

Upsample cells into the closest cluster in a reference dataset

Description

Usage

Arguments

Value

Examples

Related to tof_upsample_distance in keyes-timothy/tidytof...

R Package Documentation

Browse R Packages

We want your feedback!

keyes-timothy/tidytof
Analyze High-dimensional Cytometry Data Using Tidy Data Principles

tof_upsample_distance: Upsample cells into the closest cluster in a reference...
In keyes-timothy/tidytof: Analyze High-dimensional Cytometry Data Using Tidy Data Principles