umap_clustering: Perform UMAP dimensionality reduction and HDBSCAN clustering...

View source: R/clustering.R

umap_clusteringR Documentation

Perform UMAP dimensionality reduction and HDBSCAN clustering on copy number data

Description

This function takes copy number data, performs UMAP dimensionality reduction, and then applies HDBSCAN clustering to identify cell populations. It can handle both standard copy number data and haplotype-specific copy number (HSCN) data.

Usage

umap_clustering(
  CNbins,
  n_neighbors = 10,
  min_dist = 0.1,
  minPts = 30,
  seed = NULL,
  field = "copy",
  umapmetric = "correlation",
  hscn = FALSE,
  pca = NULL
)

Arguments

CNbins

A data frame containing copy number data. Must include columns for 'cell_id' and the specified 'field'.

n_neighbors

Integer. The number of neighbors to consider in UMAP. Default is 10.

min_dist

Numeric. The minimum distance between points in UMAP. Default is 0.1.

minPts

Integer. The minimum number of points to form a cluster in HDBSCAN. Default is 30.

seed

Integer or NULL. Random seed for reproducibility. Default is NULL.

field

Character. The column name in 'CNbins' to use for copy number values. Default is "copy".

umapmetric

Character. The distance metric to use in UMAP. Default is "correlation".

hscn

Logical. Whether to use haplotype-specific copy number data. Default is FALSE.

pca

Integer or NULL. Number of principal components to use in UMAP. If NULL, pca not used, this is the default.

Details

The function performs the following steps: 1. Creates a copy number matrix from the input data. 2. Applies UMAP dimensionality reduction. 3. Performs HDBSCAN clustering on the UMAP results. 4. Generates a phylogenetic tree from the clustering results.

If ‘hscn' is TRUE, the function expects columns ’copy' and 'BAF' in 'CNbins', and creates separate matrices for A and B alleles.

The function automatically adjusts 'n_neighbors' if there are too few cells. If UMAP fails, it attempts to rerun with small jitter added to the data points. The function will reduce 'minPts' if only one cluster is initially found.

Value

A list containing:

clustering

A data frame with UMAP coordinates and cluster assignments for each cell.

hdbscanresults

The results of the HDBSCAN clustering.

umapresults

The results of the UMAP dimensionality reduction.

tree

A phylogenetic tree object representing the hierarchical structure of the clusters.


shahcompbio/signals documentation built on Jan. 11, 2025, 2:20 a.m.