summarizeClonotypeCounts: Summarize clonotype counts

Description Usage Arguments Details Value Author(s) Examples

View source: R/summarizeClonotypeCounts.R

Description

Generate some summary statistics for clonal expansion based on the number of cells per clonotype.

Usage

1
2
3
4
5
6
7
8
9
summarizeClonotypeCounts(
  counts,
  use.mean = TRUE,
  use.gini = TRUE,
  use.top = c(5, 20, 100),
  use.hill = 0:2,
  downsample = TRUE,
  down.ncells = NULL
)

Arguments

counts

A list of integer vectors such as that produced by countCellsPerClonotype. Each vector corresponds to a group of cells and contains the number of cells for each clonotype in that group.

use.mean

Logical scalar indicating whether to report the mean number of cells per clonotype.

use.gini

Logical scalar indicating whether to report the Gini index.

use.top

Integer vector specifying the number of clonotypes to use to compute the top percentage.

use.hill

Integer scalar specifying the orders to use to compute Hill numbers.

downsample

Logical scalar indicating whether downsampling should be performed.

down.ncells

Integer scalar indicating the number of cells to downsample each group to. Defaults to the smallest number of sequence-containing cells across all levels in group.

Details

If use.mean=TRUE, the output will contain the numeric "mean" column, containing the average number of cells per clonotype computed across all clonotypes for a given group. Larger values indicate that cells are concentrated within a small number of clonotypes.

If use.gini=TRUE, the output will contain the numeric "gini" column, containing the Gini index for clonotype diversity in each group. Larger values indicate that cells are concentrated within a small number of clonotypes.

If use.top is specified, the output will contain the numeric "topX" columns, containing the proportion of cells assigned to the top X clonotypes (where X is the value of each use.top entry). Larger values indicate that cells are concentrated within a small number of clonotypes.

If use.hill is specified, the output will contain numeric "hillX" columns containing the Hill numbers. "hill0" is simply the number of observed clonotypes (i.e., species richness) while "hill1" and "hill2" quantify the evenness of the distribution of cells across clonotypes. ("hill2" differs from "hill1" in that the former gives more weight to dominant clonotypes.)

See countClonotypesPerCell for why downsampling is turned on by default.

Value

A DataFrame with one row per group in counts, containing summary statistics on the clonal expansion in that group.

Author(s)

Aaron Lun

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
df <- data.frame(
    cell.id=sample(LETTERS, 30, replace=TRUE),
    clonotype=sample(paste0("clonotype_", 1:5), 30, replace=TRUE)
)

y <- splitDataFrameByCell(df, field="cell.id")
out <- countCellsPerClonotype(y, "clonotype",
   group=sample(3, length(y), replace=TRUE))

summarizeClonotypeCounts(out)

LTLA/RepertoireUtils documentation built on Feb. 9, 2020, 12:51 p.m.