createPerCellDataFrame: Collapse to one cell per row
In LTLA/RandomGrabBag: Utility Functions for Analyzing Repertoire Sequencing Data

Description Usage Arguments Details Value Author(s) Examples

Collapse a SplitDataFrameList representation into a DataFrame with one row per cell.

1	createPerCellDataFrame(x, cov.field, fill = TRUE)

`x`	A SplitDataFrameList object containing one DataFrame per cell, where each row of each DataFrame contains information for one sequence in that cell.
`cov.field`	String specifying the column of `x` containing the UMI/read count per sequence.
`fill`	Logical scalar indicating whether cells with no sequences should be filled in with `NA` in the output.

This function collapses the SplitDataFrameList into a DataFrame such that each cell is represented by exactly one row. If a cell has multiple sequences, one representative sequence is chosen:

If cov.field is specified, the sequence with the largest count is selected. This favors the dominant sequence with the highest number of captured molecules.
Otherwise, the first sequence for each cell is selected. This is effectively an arbitrary choice as the ordering of sequences has no meaning.

If a cell has no sequences, the output is filled in with NA if fill=TRUE. Otherwise, it is simply not reported.

A DataFrame with one row for each cell.

Aaron Lun

df <- data.frame(
    cell.id=sample(LETTERS, 30, replace=TRUE),
    v_gene=sample(c("TRAV1", "TRAV2", "TRAV3"), 30, replace=TRUE),
    j_gene=sample(c("TRAJ4", "TRAJ5", "TRAV6"), 30, replace=TRUE),
    umi=pmax(1, rpois(30, 1))
)

Y <- splitDataFrameByCell(df, "cell.id")
createPerCellDataFrame(Y, "umi")

createPerCellDataFrame(Y)

createPerCellDataFrame(Y, fill=FALSE)