scDD | R Documentation |
Find genes with differential distributions (DD) across two conditions
scDD(SCdat, prior_param = list(alpha = 0.1, mu0 = 0, s0 = 0.01, a0 = 0.01, b0 = 0.01), permutations = 0, testZeroes = TRUE, adjust.perms = FALSE, param = bpparam(), parallelBy = c("Genes", "Permutations"), condition = "condition", min.size = 3, min.nonzero = NULL, level = 0.05, categorize = TRUE)
SCdat |
An object of class |
prior_param |
A list of prior parameter values to be used when modeling each gene as a mixture of DP normals. Default values are given that specify a vague prior distribution on the cluster-specific means and variances. |
permutations |
The number of permutations to be used in calculating
empirical p-values. If set to zero (default),
the full Bayes Factor permutation test will not be performed. Instead,
a fast procedure to identify the genes with significantly different
expression distributions will be performed using the nonparametric
Kolmogorov-Smirnov test, which tests the null hypothesis that
the samples are generated from the same continuous distribution.
This test will yield
slightly lower power than the full permutation testing framework
(this effect is more pronounced at smaller sample
sizes, and is more pronounced in the DB category), but is orders of
magnitude faster. This option
is recommended when compute resources are limited. The remaining
steps of the scDD framework will remain unchanged
(namely, categorizing the significant DD genes into patterns that
represent the major distributional changes,
as well as the ability to visualize the results with violin plots
using the |
testZeroes |
Logical indicating whether or not to test for a difference in the proportion of zeroes. This will only be done for genes that have at least one zero value (genes where all cells have a nonzero value will have a 'zero.pvalue' of NA). |
adjust.perms |
Logical indicating whether or not to adjust the permutation tests for the sample detection rate (proportion of nonzero values). If true, the residuals of a linear model adjusted for detection rate are permuted, and new fitted values are obtained using these residuals. |
param |
a |
parallelBy |
For the permutation test (if invoked), the manner in
which to parallelize. The default option
is |
condition |
A character object that contains the name of the column in
|
min.size |
a positive integer that specifies the minimum size of a
cluster (number of cells) for it to be used
during the classification step. Any clusters containing fewer than
|
min.nonzero |
a positive integer that specifies the minimum number of
nonzero cells in each condition required for the test of differential
distributions. If a gene has fewer nonzero cells per condition, it will
still be tested for DZ (if |
level |
numeric value between 0 and 1 that specifies the alpha level for significance of a differential gene test (default value 0.05). This is used to decide whether to classify a gene into one of the differential patterns. If 'testZeroes' is FALSE and the adjusted p-value for a given gene is below 'level', then the gene is categorized. Alternatively, if 'testZeroes' is TRUE, then the adjusted p-value must be below 'level/2' in order to be considered significant and categorized. This is done to control for multiple testing since 'testZeroes=TRUE' means that each gene is tested for a difference in nonzeroes and zeroes separately. |
categorize |
a logical indicating whether to determine which categories (DE, DP, DM, DB) each gene belongs to (default = TRUE). This can only be set to FALSE if 'permutations' is set to zero, since the full model fitting will automatically be carried out if permutations are run. |
Find genes with differential distributions (DD) across two conditions. Models each log-transformed gene as a Dirichlet Process Mixture of normals and uses a permutation test to determine whether condition membership is independent of sample clustering. The FDR adjusted (Benjamini-Hochberg) permutation p-value is returned along with the classification of each significant gene (with p-value less than 0.05 (or 0.025 if also testing for a difference in the proportion of zeroes)) into one of four categories (DE, DP, DM, DB). For genes that do not show significant influence, of condition on clustering, an optional test of whether the proportion of zeroes (dropout rate) is different across conditions is performed (DZ).
A SingleCellExperiment
object that contains the data and
sample information from the input object, but where the results objects
are now added to the metadata
slot. The metadata slot is now a
list with four items: the first (main results object) is a data.frame
with the following columns:
'gene': gene name (matches rownames of SCdat)
'DDcategory': name of the DD (DE, DP, DM, DB, DZ) pattern (or NS = not significant)
'Clusters.combined': the number of clusters identified overall
'Clusters.C1': the number of clusters identified in condition 1 alone
'Clusters.C2': the number of clusters identified in condition 2 alone
'nonzero.pvalue': permutation (or KS) p-value for testing difference in nonzero expression values
'nonzero.pvalue.adj': Benjamini-Hochberg adjusted version of the 'nonzero.pvalue'column
'zero.pvalue': p-value for test of difference in dropout rate (only if 'testZeroes' is TRUE)
'zero.pvalue': Benjamini-Hochberg adjusted version of the previous column (only if 'testZeroes' is TRUE)
‘combined.pvalue': Fisher’s combined p-value for a difference in nonzero or zero values (only if 'testZeroes' is TRUE).
'combined.pvalue.adj': Benjamini-Hochberg adjusted version of the previous column (only if 'testZeroes' is TRUE)
The remaining three elements are matrices (first for condition
1 and 2 combined,
then condition 1 alone, then condition 2 alone) that contains the cluster
memberships for each sample (cluster 1,2,3,...) in columns and
genes in rows. Zeroes, which are not involved in the clustering, are
labeled as zero. See the results
function for a convenient
way to extract these results objects.
Korthauer KD, Chu LF, Newton MA, Li Y, Thomson J, Stewart R, Kendziorski C. A statistical approach for identifying differential distributions in single-cell RNA-seq experiments. Genome Biology. 2016 Oct 25;17(1):222. https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1077-y
# load toy simulated example SingleCellExperiment object to find DD genes data(scDatExSim) # check that this object is a member of the SingleCellExperiment class # and that it contains 200 samples and 30 genes class(scDatExSim) show(scDatExSim) # set arguments to pass to scDD function # we will perform 100 permutations on each of the 30 genes prior_param=list(alpha=0.01, mu0=0, s0=0.01, a0=0.01, b0=0.01) nperms <- 100 # call the scDD function to perform permutations, classify DD genes, # and return results # we won't perform the test for a difference in the proportion of zeroes # since none exists in this simulated toy example data # this step will take significantly longer with more genes and/or # more permutations scDatExSim <- scDD(scDatExSim, prior_param=prior_param, permutations=nperms, testZeroes=FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.