Description Usage Arguments Details Value Normalization for cell number Author(s) Examples
Count the number of cells that express each unique combination of genes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | countCellsPerGeneCombo(x, ...)
## S4 method for signature 'ANY'
countCellsPerGeneCombo(
x,
gene.field,
group = NULL,
downsample = FALSE,
down.ncells = NULL,
row.names = TRUE
)
## S4 method for signature 'CompressedSplitDataFrameList'
countCellsPerGeneCombo(x, gene.field, cov.field, group = NULL, ...)
|
x |
Any data.frame-like object where each row corresponds to a single cell and contains its representative sequence.
Rows with any Alternatively, a SplitDataFrameList where each DataFrame corresponds to a cell and each row in that DataFrame is a sequence in that cell. |
... |
For the generic, further arguments to pass to individual methods. For the |
gene.field |
Character vector of names of columns of |
group |
Factor of length equal to |
downsample |
Logical scalar indicating whether downsampling should be performed. |
down.ncells |
Integer scalar indicating the number of cells to downsample each group to.
Defaults to the number of cells in the smallest group in |
row.names |
Logical scalar indicating whether row names should be added by concatenating all gene names per combination. |
cov.field |
String specifying the column of |
The aim of this function is to generate a count matrix for use in differential “expression” analyses, i.e., does one particular group of cells express a particular gene combination more frequently than another group? This can be useful to examine the effect of particular experimental conditions or the behavior of different cell states, especially if the specific biological function (e.g., antigen) of each gene combination is known in advance.
If cov.field
is set, only the most high-abundance sequence is used from each cell.
In contrast, setting cov.field=NULL
will count each sequence separately,
such that one cell may contribute multiple times.
It is probably safest to set this to some non-NULL
value to avoid complications from dependencies between counts,
though any problems are also probably minor.
A SummarizedExperiment where each row corresponds to a unique gene combination
and each column corresponds to a level of group
(or all cells, if group=NULL
).
The assays
contain a single matrix containing the number of cells for each gene combination and grouping level,
while the rowData
contains information about the gene combination.
Here, expression is defined in terms of number of cells expressing the gene, rather than the more typical quantity of the number of reads or UMIs assigned to that gene. If the sequencing coverage varies between groups, we assume that such changes have the same scaling effect on the probability of detecting each gene combination, which cancels out after normalizing by the total number of cells.
However, the above assumption only works for differential expression analyses between groups.
When comparing other metrics such as diversity values (see summarizeGeneComboCounts
),
scaling normalization is not sufficient and we instead resort to downsampling all groups to the same total cell number.
This is achieved with downsample=TRUE
with the automatically determined down.ncells
,
which eliminates uninteresting technical differences between groups from cell capture efficiency or sample size.
Aaron Lun
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | df <- data.frame(
cell.id=sample(LETTERS, 30, replace=TRUE),
v_gene=sample(c("TRAV1", "TRAV2", "TRAV3"), 30, replace=TRUE),
j_gene=sample(c("TRAJ4", "TRAJ5", "TRAV6"), 30, replace=TRUE),
umi=pmax(1, rpois(30, 1))
)
y <- splitDataFrameByCell(df, field="cell.id")
out <- countCellsPerGeneCombo(y, c("v_gene", "j_gene"), cov.field="umi")
rowData(out)
assay(out)
out2 <- countCellsPerGeneCombo(y, c("v_gene", "j_gene"), cov.field="umi",
group=sample(10, length(y), replace=TRUE))
rowData(out2)
assay(out2)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.