tof_find_knn: Find the k-nearest neighbors of each cell in a...

tof_find_knnR Documentation

Find the k-nearest neighbors of each cell in a high-dimensional cytometry dataset.

Description

Find the k-nearest neighbors of each cell in a high-dimensional cytometry dataset.

Usage

tof_find_knn(
  .data,
  k = min(10, nrow(.data)),
  distance_function = c("euclidean", "cosine", "l2", "ip"),
  .query,
  ...
)

Arguments

.data

A 'tof_tibble' or 'tibble' in which each row represents a cell and each column represents a high-dimensional cytometry measurement.

k

An integer indicating the number of nearest neighbors to return for each cell.

distance_function

A string indicating which distance function to use for the nearest-neighbor calculation. Options include "euclidean" (the default) and "cosine" distances.

.query

A set of cells to be queried against .data (i.e. a set of cells for which to find nearest neighbors within .data). Defaults to .data itself, i.e. finding nearest neighbors for all cells in .data.

...

Optional additional arguments to pass to hnsw_knn

Value

A list with two elements: "neighbor_ids" and "neighbor_distances," both of which are n by k matrices (in which n is the number of cells in the input '.data'. The [i,j]-th entry of "neighbor_ids" represents the row index for the j-th nearest neighbor of the cell in the i-th row of '.data'. The [i,j]-th entry of "neighbor_distances" represents the distance between those two cells according to 'distance_function'.

Examples

sim_data <-
    dplyr::tibble(
        cd45 = rnorm(n = 1000),
        cd38 = rnorm(n = 1000),
        cd34 = rnorm(n = 1000),
        cd19 = rnorm(n = 1000)
    )

# Find the 10 nearest neighbors of each cell in the dataset
tof_find_knn(
    .data = sim_data,
    k = 10,
    distance_function = "euclidean"
)

# Find the 10 approximate nearest neighbors
tof_find_knn(
    .data = sim_data,
    k = 10,
    distance_function = "euclidean",
)


keyes-timothy/tidytof documentation built on Aug. 28, 2024, 8:37 a.m.