summarizeGeneComboCounts: Summarize gene combination counts

Description Usage Arguments Details Value Author(s) Examples

View source: R/summarizeGeneComboCounts.R

Description

Generate some summary statistics for the gene diversity, based on the number of cells expressing each gene combination.

Usage

1
2
3
4
5
6
7
8
summarizeGeneComboCounts(
  counts,
  use.gini = TRUE,
  use.top = c(5, 20, 100),
  use.hill = 0:2,
  downsample = TRUE,
  down.ncells = NULL
)

Arguments

counts

A SummarizedExperiment containing cell counts for each gene combination (row) and group (column), such as that produced by countCellsPerGeneCombo. Alternatively, a count matrix containing the same information.

use.gini

Logical scalar indicating whether to report the Gini index.

use.top

Integer vector specifying the number of clonotypes to use to compute the top percentage.

use.hill

Integer scalar specifying the orders to use to compute Hill numbers.

downsample

Logical scalar indicating whether downsampling should be performed.

down.ncells

Integer scalar indicating the number of cells to downsample each group to. Defaults to the smallest number of sequence-containing cells across all levels in group.

Details

If use.gini=TRUE, the output will contain the numeric "gini" column, containing the Gini index for gene combination diversity in each group. Larger values indicate that a small number of gene combinations are expressed in many cells.

If use.top is specified, the output will contain the numeric "topX" columns, containing the proportion of cells expressing the top X gene combinations (where X is the value of each use.top entry). Larger values indicate that a small number of gene combinations are expressed in many cells.

If use.hill is specified, the output will contain numeric "hillX" columns containing the Hill numbers. "hill0" is simply the number of observed gene combinations (i.e., species richness) while "hill1" and "hill2" quantify the evenness of the distribution of cells across gene combinations. ("hill2" differs from "hill1" in that the former gives more weight to dominant gene combinations.)

See countCellsPerGeneCombo for an explanation of why downsampling is turned on by default.

Value

A DataFrame with one row per group in counts, containing summary statistics on the diversity of gene expression in that group.

Author(s)

Aaron Lun

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
df <- data.frame(
    cell.id=sample(LETTERS, 30, replace=TRUE),
    v_gene=sample(c("TRAV1", "TRAV2", "TRAV3"), 30, replace=TRUE),
    j_gene=sample(c("TRAJ4", "TRAJ5", "TRAV6"), 30, replace=TRUE)
)

y <- splitDataFrameByCell(df, field="cell.id")
out <- countCellsPerGeneCombo(y, c("v_gene", "j_gene"),
   group=sample(10, length(y), replace=TRUE))

summarizeGeneComboCounts(out)

LTLA/RandomGrabBag documentation built on Feb. 8, 2020, 12:30 p.m.