runUmapDiscovrExperiment: Calculate UMAP coordinates for the cells from a...

View source: R/exportedUtils.R

runUmapDiscovrExperimentR Documentation

Calculate UMAP coordinates for the cells from a discovrExperiment object

Description

This function generates a UMAP from the cells in a discovrExperiment object. To do this, it extracts the marker scores for each cells, z-scores the expression values within each sample (similar to the briDiscovr metaclustering process), optionally downsamples the cells to speed the process (with downsampling frequency tunable at the cell population level), and then runs the UMAP algorithm. The UMAP algorithm is run using the umap package, which is a wrapper for the uwot package. Results can be made reproducible by passing a non-NULL value for seed. This function returns a data frame with the UMAP coordinates for each cell, as well as the original cell population, sample information, and metacluster if available. The outputs are intended to be visualized using plotting software such as ggplot2.

Usage

runUmapDiscovrExperiment(
  experiment,
  umapMarkers = NULL,
  downsampleFreq = c(parentPopulation = 100, childPopulations = 1),
  seed = NULL,
  returnUmapObject = FALSE,
  returnExpressionZScores = FALSE,
  ...
)

Arguments

experiment

A discovrExperiment created using setupDiscovrExperiment, clusterDiscovrExperiment, or metaclusterDiscovrExperiment. In order to return metacluster numbers for each cell, its status must be "metaclustered".

umapMarkers

A character vector, the markers to be used for UMAP. For the default value, NULL, the function extracts the set of markers from the "clusteringMarkers" element of the discovrExperiment object.

downsampleFreq

numeric, specifying how to downsample the cells prior to running UMAP. Several alternative methods can be used by providing different numeric vectors. If a single value is provided, all populations are downsampled to this frequency. If a vector of length 2 is provided (optionally with elements named "parentPopulation" and "childPopulations"), the "parentPopulation" or first element is used as the frequency for the parent population (extracted from the discovrExperiment object), and the "childPopulations" or second element is used as the frequency for the child populations. If a named vector is provided, the names must match the cell populations, and the values are the frequencies to downsample each population to. If NULL, no downsampling is performed. The default is c("parentPopulation" = 100, "childPopulations" = 1), which retains all cells from child populations and subsets the parent population to 1/100. Note that downsampling is based on the order of the cells in discovrExperiment, so changes that alter the order of cells will make the downsampling results non-reproducible.

seed

(default: NULL) numeric, the seed to be passed to set.seed to make the UMAP (more) reproducible. If NULL, no seed is set.

returnUmapObject

(default: FALSE) logical, if TRUE, returns the full UMAP output object and the data frame of cell information as a list. If FALSE, returns only the data frame of cell information.

returnExpressionZScores

(default: FALSE) logical, if TRUE, includes the z-scored expression values for the markers used in the UMAP in the output data frame. If FALSE, the data frame only contains the UMAP coordinates and cell information.

...

optional arguments passed to umap::umap.

Value

A data frame containing the UMAP coordinates for each cell, as columns 'UMAP1' and 'UMAP2', and the original cell population and sample information. The data frame also contains the metacluster information, if available. If returnUmapObject is TRUE, returns a list with the data frame of cell information as element 'data' and the UMAP output object as element 'umapObject'. If returnExpressionZScores is TRUE, the data frame also contains the z-scored expression values for the markers used in the UMAP.

Author(s)

Matthew J Dufort, mdufort@benaroyaresearch.org


BenaroyaResearch/briDiscovr documentation built on March 15, 2024, 12:31 a.m.