summarizeClonotypeCounts: Summarize clonotype counts
In LTLA/RepertoireUtils: Utility Functions for Analyzing Repertoire Sequencing Data

Description Usage Arguments Details Value Author(s) Examples

View source: R/summarizeClonotypeCounts.R

Generate some summary statistics for clonal expansion based on the number of cells per clonotype.

summarizeClonotypeCounts(
  counts,
  use.mean = TRUE,
  use.gini = TRUE,
  use.top = c(5, 20, 100),
  use.hill = 0:2,
  downsample = TRUE,
  down.ncells = NULL
)

`counts`	A list of integer vectors such as that produced by `countCellsPerClonotype`. Each vector corresponds to a group of cells and contains the number of cells for each clonotype in that group.
`use.mean`	Logical scalar indicating whether to report the mean number of cells per clonotype.
`use.gini`	Logical scalar indicating whether to report the Gini index.
`use.top`	Integer vector specifying the number of clonotypes to use to compute the top percentage.
`use.hill`	Integer scalar specifying the orders to use to compute Hill numbers.
`downsample`	Logical scalar indicating whether downsampling should be performed.
`down.ncells`	Integer scalar indicating the number of cells to downsample each group to. Defaults to the smallest number of sequence-containing cells across all levels in `group`.

If use.mean=TRUE, the output will contain the numeric "mean" column, containing the average number of cells per clonotype computed across all clonotypes for a given group. Larger values indicate that cells are concentrated within a small number of clonotypes.

If use.gini=TRUE, the output will contain the numeric "gini" column, containing the Gini index for clonotype diversity in each group. Larger values indicate that cells are concentrated within a small number of clonotypes.

If use.top is specified, the output will contain the numeric "topX" columns, containing the proportion of cells assigned to the top X clonotypes (where X is the value of each use.top entry). Larger values indicate that cells are concentrated within a small number of clonotypes.

If use.hill is specified, the output will contain numeric "hillX" columns containing the Hill numbers. "hill0" is simply the number of observed clonotypes (i.e., species richness) while "hill1" and "hill2" quantify the evenness of the distribution of cells across clonotypes. ("hill2" differs from "hill1" in that the former gives more weight to dominant clonotypes.)

See countClonotypesPerCell for why downsampling is turned on by default.

A DataFrame with one row per group in counts, containing summary statistics on the clonal expansion in that group.

Aaron Lun

df <- data.frame(
    cell.id=sample(LETTERS, 30, replace=TRUE),
    clonotype=sample(paste0("clonotype_", 1:5), 30, replace=TRUE)
)

y <- splitDataFrameByCell(df, field="cell.id")
out <- countCellsPerClonotype(y, "clonotype",
   group=sample(3, length(y), replace=TRUE))

summarizeClonotypeCounts(out)