Run.STACAS: Run the STACAS integration pipeline

Run.STACASR Documentation

Run the STACAS integration pipeline

Description

This function is a wrapper for running the several steps required to integrate single-cell datasets using STACAS: 1) Finding integration anchors; 2) Calculating the sample tree for the order of dataset integration; 3) Dataset batch effect correction and integration

Usage

Run.STACAS(
  object.list = NULL,
  assay = NULL,
  new.assay.name = "integrated",
  reference = NULL,
  max.seed.objects = 10,
  min.sample.size = 100,
  anchor.features = 1000,
  genesBlockList = "default",
  dims = 30,
  k.anchor = 5,
  k.score = 30,
  k.weight = 100,
  alpha = 0.8,
  anchor.coverage = 0.5,
  correction.scale = 2,
  cell.labels = NULL,
  label.confidence = 1,
  scale.data = FALSE,
  hclust.method = c("single", "complete", "ward.D2", "average"),
  seed = 123,
  verbose = FALSE
)

Arguments

object.list

A list of Seurat objects. Anchors will be determined between pairs of objects, and can subsequently be used for Seurat dataset integration.

assay

A vector containing the assay to use for each Seurat object in object.list. If not specified, uses the default assay.

new.assay.name

Assay to store the integrated data

reference

A vector specifying the object/s to be used as a reference during integration. If NULL (default), all pairwise anchors are found (no reference/s). If not NULL, the corresponding objects in object.list will be used as references. When using a set of specified references, anchors are first found between each query and each reference. The references are then integrated through pairwise integration. Each query is then mapped to the integrated reference.

max.seed.objects

Number of objects to use as seeds to build the integration tree. Automatically chooses the largest max.seed.objects datasets; the remaining datasets will be added sequentially to the reference.

anchor.features

Can be either:

  • A numeric value. This will call Seurat::SelectIntegrationFeatures to identify anchor.features genes for anchor finding.

  • A pre-calculated vector of integration features to be used for anchor search.

genesBlockList

If anchor.features is numeric, genesBlockList optionally takes a list of vectors of gene names. These genes will be removed from the integration features. If set to "default", STACAS uses its internal list data("genes.blocklist"). This is useful to mitigate effect of genes associated with technical artifacts or batch effects (e.g. mitochondrial, heat-shock response).

dims

The number of dimensions used for PCA reduction

k.anchor

The number of neighbors to use for identifying anchors

k.score

The number of neighbors to use for scoring anchors

k.weight

Number of neighbors for local anchor weighting. Set k.weight="max" to disable local weighting

alpha

Weight on rPCA distance for rescoring (between 0 and 1).

anchor.coverage

Center of logistic function, based on quantile value of rPCA distance distribution

correction.scale

Scale factor for logistic function (multiplied by SD of rPCA distance distribution)

cell.labels

A metadata column name, storing cell type annotations. These will be taken into account for semi-supervised alignment (optional). Cells annotated as NA or NULL will not be penalized in semi-supervised alignment

label.confidence

How much you trust the provided cell labels (from 0 to 1).

scale.data

Whether to rescale expression data before PCA reduction.

hclust.method

Clustering method for integration tree (single, complete, average, ward)

seed

Random seed for probabilistic anchor acceptance

verbose

Print all output

Value

Returns a Seurat object with a new integrated Assay. Also, centered, scaled variable features data are returned in the scale.data slot, and the pca of these batch-corrected scale data in the pca 'reduction' slot


carmonalab/STACAS documentation built on Feb. 3, 2024, 7:42 a.m.