countSequencesPerCell: Count sequences per cell

Description Usage Arguments Details Value Author(s) Examples

View source: R/countSequencesPerCell.R

Description

Count the number of sequences per cell, possibly after filtering.

Usage

1
countSequencesPerCell(x, filter.field = NULL, filter.value = NULL)

Arguments

x

A SplitDataFrameList where each DataFrame is a cell and each row is a sequence.

filter.field

Character vector specifying the columns on which to filter sequences prior to counting.

filter.value

Character vector of length equal to filter.field, specifying the values to retain for each filter field.

Details

The number of sequences per cell is often a useful diagnostic. At its simplest, we can use it to determine whether a particular cell contributes to the immune repertoire at all, e.g., to verify clusters that are B or T cells.

A more complex use case is to identify cells that express multiple sequences. This is generally a minority occurrence due to allelic exclusion in most cells (see also topCoveragePropPerCell) but can be inflated by technical artifacts such as doublets or contamination from ambient noise.

The filtering enables us to perform more complex diagnostics, e.g., count the number of productive, full-length, high-quality sequences in each cell.

Value

An integer scalar containing the number of (filtered) sequences per cell.

Author(s)

Aaron Lun

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
df <- data.frame(
    cell.id=sample(LETTERS, 30, replace=TRUE),
    v_gene=sample(c("TRAV1", "TRAV2", "TRAV3"), 30, replace=TRUE),
    j_gene=sample(c("TRAJ4", "TRAJ5", "TRAV6"), 30, replace=TRUE),
    productive=sample(c("True", "False"), 30, replace=TRUE)
)

y <- splitDataFrameByCell(df, field="cell.id")
countSequencesPerCell(y)
countSequencesPerCell(y, filter.field="productive", filter.value="True")

LTLA/RepertoireUtils documentation built on Feb. 9, 2020, 12:51 p.m.