readToUmiPerCell: Read to UMI per cell
In LTLA/RepertoireUtils: Utility Functions for Analyzing Repertoire Sequencing Data

Description Usage Arguments Details Value Author(s) Examples

Compute the read-to-UMI ratio for each cell.

1	readToUmiPerCell(x, read.field, umi.field)

`x`	A SplitDataFrameList where each DataFrame is a cell and each row is a sequence.
`read.field`	String containing the name of the column containing the read count data.
`umi.field`	String containing the name of the column containing the UMI count data.

This function is designed to evaluate the degree of redundancy in the read coverage of each UMI. High values indicate that the reads are highly redundant such that little can be gained from further sequencing.

Note that, in repertoire data, the definition of “high” is somewhat different from usual. This is because only deeply sequenced transcripts will survive the assembly and annotation process, such that the reported sequences are likely to be biased towards very high read-to-UMI ratios. Values around 1000 seem to be typical.

If a cell has multiple sequences, their counts are simply added together across sequences to compute the per-cell ratio.

A numeric vector containing the ratio of reads to UMI for each cell.

Aaron Lun

df <- data.frame(
    cell.id=sample(LETTERS, 30, replace=TRUE),
    v_gene=sample(c("TRAV1", "TRAV2", "TRAV3"), 30, replace=TRUE),
    j_gene=sample(c("TRAJ4", "TRAJ5", "TRAV6"), 30, replace=TRUE),
    reads=rnbinom(30, mu=20, size=0.5),
    umis=rnbinom(30, mu=2, size=1)
)

y <- splitDataFrameByCell(df, field="cell.id")
readToUmiPerCell(y, "reads", "umis")