summarizeGeneComboCounts: Summarize gene combination counts
In LTLA/RandomGrabBag: Utility Functions for Analyzing Repertoire Sequencing Data

Description Usage Arguments Details Value Author(s) Examples

View source: R/summarizeGeneComboCounts.R

Generate some summary statistics for the gene diversity, based on the number of cells expressing each gene combination.

summarizeGeneComboCounts(
  counts,
  use.gini = TRUE,
  use.top = c(5, 20, 100),
  use.hill = 0:2,
  downsample = TRUE,
  down.ncells = NULL
)

`counts`	A SummarizedExperiment containing cell counts for each gene combination (row) and group (column), such as that produced by `countCellsPerGeneCombo`. Alternatively, a count matrix containing the same information.
`use.gini`	Logical scalar indicating whether to report the Gini index.
`use.top`	Integer vector specifying the number of clonotypes to use to compute the top percentage.
`use.hill`	Integer scalar specifying the orders to use to compute Hill numbers.
`downsample`	Logical scalar indicating whether downsampling should be performed.
`down.ncells`	Integer scalar indicating the number of cells to downsample each group to. Defaults to the smallest number of sequence-containing cells across all levels in `group`.

If use.gini=TRUE, the output will contain the numeric "gini" column, containing the Gini index for gene combination diversity in each group. Larger values indicate that a small number of gene combinations are expressed in many cells.

If use.top is specified, the output will contain the numeric "topX" columns, containing the proportion of cells expressing the top X gene combinations (where X is the value of each use.top entry). Larger values indicate that a small number of gene combinations are expressed in many cells.

If use.hill is specified, the output will contain numeric "hillX" columns containing the Hill numbers. "hill0" is simply the number of observed gene combinations (i.e., species richness) while "hill1" and "hill2" quantify the evenness of the distribution of cells across gene combinations. ("hill2" differs from "hill1" in that the former gives more weight to dominant gene combinations.)

See countCellsPerGeneCombo for an explanation of why downsampling is turned on by default.

A DataFrame with one row per group in counts, containing summary statistics on the diversity of gene expression in that group.

Aaron Lun

df <- data.frame(
    cell.id=sample(LETTERS, 30, replace=TRUE),
    v_gene=sample(c("TRAV1", "TRAV2", "TRAV3"), 30, replace=TRUE),
    j_gene=sample(c("TRAJ4", "TRAJ5", "TRAV6"), 30, replace=TRUE)
)

y <- splitDataFrameByCell(df, field="cell.id")
out <- countCellsPerGeneCombo(y, c("v_gene", "j_gene"),
   group=sample(10, length(y), replace=TRUE))

summarizeGeneComboCounts(out)