FindAnchors.STACAS: Find integration anchors using STACAS

FindAnchors.STACASR Documentation

Find integration anchors using STACAS

Description

This function computes anchors between datasets for single-cell data integration. It is based on the Seurat function FindIntegrationAnchors, but is optimized for integration of heterogenous data sets containing only partially overlapping cells subsets. It also computes a measure of distance between candidate anchors (rPCA), which is combined with the Seurat's anchor weight by the factor alpha. Prior knowledge about cell types can optionally be provided to guide anchor finding. Give this information in the cell.labels metadata column. This annotation level, which can be incomplete (set to NA for cells of unknown type), is used to penalize anchor pairs with inconsistent annotation. The set of anchors returned by this function can then be passed to IntegrateData.STACAS for dataset integration.

Usage

FindAnchors.STACAS(
  object.list = NULL,
  assay = NULL,
  reference = NULL,
  min.sample.size = 100,
  max.seed.objects = 10,
  anchor.features = 1000,
  genesBlockList = "default",
  dims = 30,
  k.anchor = 5,
  k.score = 30,
  alpha = 0.8,
  anchor.coverage = 0.5,
  correction.scale = 2,
  cell.labels = NULL,
  label.confidence = 1,
  scale.data = FALSE,
  seed = 123,
  verbose = TRUE
)

Arguments

object.list

A list of Seurat objects. Anchors will be determined between pairs of objects, and can subsequently be used for Seurat dataset integration.

assay

A vector containing the assay to use for each Seurat object in object.list. If not specified, uses the default assay.

reference

A vector specifying the object/s to be used as a reference during integration. If NULL (default), all pairwise anchors are found (no reference/s). If not NULL, the corresponding objects in object.list will be used as references. When using a set of specified references, anchors are first found between each query and each reference. The references are then integrated through pairwise integration. Each query is then mapped to the integrated reference.

min.sample.size

Minimum number of cells per sample. Objects with fewer than this number of cells are not integrated.

max.seed.objects

Number of objects to use as seeds to build the integration tree. Automatically chooses the largest max.seed.objects datasets; the remaining datasets will be added sequentially to the reference.

anchor.features

Can be either:

  • A numeric value. This will call FindVariableFeatures.STACAS to identify anchor.features that are consistently variable across datasets

  • A pre-calculated vector of integration features to be used for anchor search.

genesBlockList

If anchor.features is numeric, genesBlockList optionally takes a (list of) vectors of gene names. These genes will be removed from the integration features. If set to "default", STACAS uses its internal list data("genes.blocklist"). This is useful to mitigate effect of genes associated with technical artifacts or batch effects (e.g. mitochondrial, heat-shock response).

dims

The number of dimensions used for PCA reduction

k.anchor

The number of neighbors to use for identifying anchors

k.score

The number of neighbors to use for scoring anchors

alpha

Weight on rPCA distance for rescoring (between 0 and 1).

anchor.coverage

Center of logistic function, based on quantile value of rPCA distance distribution

correction.scale

Scale factor for logistic function (multiplied by SD of rPCA distance distribution)

cell.labels

A metadata column name, storing cell type annotations. These will be taken into account for semi-supervised alignment (optional). Note that not all cells need to be annotated - please set unannotated cells as NA or 'unknown' for this column. Cells with NA or 'unknown' cell labels will not be penalized in semi-supervised alignment.

label.confidence

How much you trust the provided cell labels (from 0 to 1).

scale.data

Whether to rescale expression data before PCA reduction.

seed

Random seed for probabilistic anchor acceptance

verbose

Print all output

Value

Returns an AnchorSet object, which can be passed to IntegrateData.STACAS


carmonalab/STACAS documentation built on Feb. 3, 2024, 7:42 a.m.