countSequencesPerCell: Count sequences per cell
In LTLA/RepertoireUtils: Utility Functions for Analyzing Repertoire Sequencing Data

Description Usage Arguments Details Value Author(s) Examples

Count the number of sequences per cell, possibly after filtering.

1	countSequencesPerCell(x, filter.field = NULL, filter.value = NULL)

`x`	A SplitDataFrameList where each DataFrame is a cell and each row is a sequence.
`filter.field`	Character vector specifying the columns on which to filter sequences prior to counting.
`filter.value`	Character vector of length equal to `filter.field`, specifying the values to retain for each filter field.

The number of sequences per cell is often a useful diagnostic. At its simplest, we can use it to determine whether a particular cell contributes to the immune repertoire at all, e.g., to verify clusters that are B or T cells.

A more complex use case is to identify cells that express multiple sequences. This is generally a minority occurrence due to allelic exclusion in most cells (see also topCoveragePropPerCell) but can be inflated by technical artifacts such as doublets or contamination from ambient noise.

The filtering enables us to perform more complex diagnostics, e.g., count the number of productive, full-length, high-quality sequences in each cell.

An integer scalar containing the number of (filtered) sequences per cell.

Aaron Lun

df <- data.frame(
    cell.id=sample(LETTERS, 30, replace=TRUE),
    v_gene=sample(c("TRAV1", "TRAV2", "TRAV3"), 30, replace=TRUE),
    j_gene=sample(c("TRAJ4", "TRAJ5", "TRAV6"), 30, replace=TRUE),
    productive=sample(c("True", "False"), 30, replace=TRUE)
)

y <- splitDataFrameByCell(df, field="cell.id")
countSequencesPerCell(y)
countSequencesPerCell(y, filter.field="productive", filter.value="True")