clonoStats: Assign cell-level clonotypes and calculate abundances

clonoStatsR Documentation

Assign cell-level clonotypes and calculate abundances

Description

Assign clonotype labels to cells and produce two summary tables: the clonotypes x samples table of abundances and the counts x samples table of clonotype frequencies.

Usage

clonoStats(x, ...)

## S4 method for signature 'SplitDataFrameList'
clonoStats(
  x,
  group = "sample",
  type = NULL,
  assignment = FALSE,
  method = "EM",
  lang = c("cpp", "r"),
  thresh = 0.01,
  iter.max = 1000,
  BPPARAM = SerialParam()
)

## S4 method for signature 'SingleCellExperiment'
clonoStats(x, contigs = "contigs", group = "sample", ...)

## S4 method for signature 'clonoStats'
clonoStats(x, group = NULL, lang = c("cpp", "r"))

Arguments

x

A SplitDataFrameList object containing V(D)J contig information, split by cell barcodes, as created by readVDJcontigs. Alternatively, a SingleCellExperiment object with such a SplitDataFrameList in the colData, as created by addVDJtoSCE.

...

additional arguments.

group

character. The name of the column in x (or the colData of x, for SingleCellExperiment objects) that stores each cell's group identity, typically either its sample of origin or cluster label. Alternatively, a vector of length equal to x (or ncol(x)) indicating the group identity. Providing this information can dramatically speed up computation. When running clonoStats for the first time on a dataset, we highly recommend setting the group identity to sample of origin to avoid unwanted cross-talk between samples.

type

character. The type of VDJ data (one of "TCR" or "BCR"). If NULL, this is determined by the most prevalent chain types in x.

assignment

logical. Whether or not to return the full nCells x nClonotypes sparse matrix of clonotype assignments (default = FALSE)

method

character. Which method to use for assigning cell-level clonotypes. Options are "EM" (default), "unique", or "CellRanger". Alternatively, this may be the name of a numeric column of the contig data or any chain type contained therein. See Details.

lang

character. Indicates which implementation of certain methods to use. The EM algorithm is implemented in both pure R ('r') and mixed R and C++ ('cpp', default) versions. Similarly, clonotype summarization is implemented in two ways, which can impact speed, regardless of choice of method.

thresh

Numeric threshold for convergence of the EM algorithm. Indicates the maximum allowable deviation in a count between updates. Only used if method = "EM".

iter.max

Maximum number of iterations for the EM algorithm. Only used if method = "EM".

BPPARAM

A BiocParallelParam object specifying the parallel backend for distributed clonotype assignment operations (split by group). Default is BiocParallel::SerialParam().

contigs

character. When x is a SingleCellExperiment, this is the name of the column in the colData of x that contains the VDJ contig data.

Details

Assign cells (with at least one V(D)J contig) to clonotypes and produce summary tables that can be used for downstream analysis. Clonotype assignment can be handled in multiple ways depending on the choice of "method":

  • "EM": Cells are assigned probabilistically to their most likely clonotype(s) with the Expectation-Maximization (EM) algorithm. For ambiguous cells, this leads to proportional (non-integer) assignment across multiple clonotypes and a frequency table of (non-integer) expected counts.

  • "unique": Cells are assigned a clonotype if (and only if) they can be uniquely assigned a single clonotype. For a T cell, this means having exactly one alpha chain and one beta chain.

  • "CellRanger": Clonotype labels are taken from contig data and matched across samples.

  • column name in contig data: Similar to "unique", but additionally, cells with multiples of a particular chain are assigned a "dominant" clonotype based on which contig has the higher value in this column (typical choices being "umis" or "reads").

  • type of chain in contig data: Clonotypes are based entirely on this type of chain (eg. "TRA" or "TRB") and cells may be assigned to multiple clonotypes, if multiples of that chain are present.

The "EM", "unique", and UMI/read-based quantification methods all define a clonotype as a pair of specific chains (alpha and beta for T cells, heavy and light for B cells). Unlike other methods, the EM algorithm assigns clonotypes probabilistically, which can lead to non-integer counts for cells with ambiguous information (ie. only an alpha chain, or two alphas and one beta chain).

We highly recommend providing information on each cell's sample of origin, as this can speed up computation and provide more accurate results. This is particularly important for the EM algorithm, which shares information across cells in the same group, so splitting by sample can improve accuracy by removing extraneous clonotypes from the set of possibilities for a particular cell.

Value

Returns an object of class clonoStats, containing group-level clonotype summaries. May optionally include a sparse matrix of cell-level assignment information, if assignment = TRUE. If x is a SingleCellExperiment object, this output is added to the metadata.

See Also

clonoStats

Examples

data('contigs')
clonoStats(contigs)


kstreet13/VDJdive documentation built on May 31, 2024, 1:26 p.m.