scDD  R Documentation 
Find genes with differential distributions (DD) across two conditions
scDD(SCdat, prior_param = list(alpha = 0.1, mu0 = 0, s0 = 0.01, a0 = 0.01, b0 = 0.01), permutations = 0, testZeroes = TRUE, adjust.perms = FALSE, param = bpparam(), parallelBy = c("Genes", "Permutations"), condition = "condition", min.size = 3, min.nonzero = NULL, level = 0.05, categorize = TRUE)
SCdat 
An object of class 
prior_param 
A list of prior parameter values to be used when modeling each gene as a mixture of DP normals. Default values are given that specify a vague prior distribution on the clusterspecific means and variances. 
permutations 
The number of permutations to be used in calculating
empirical pvalues. If set to zero (default),
the full Bayes Factor permutation test will not be performed. Instead,
a fast procedure to identify the genes with significantly different
expression distributions will be performed using the nonparametric
KolmogorovSmirnov test, which tests the null hypothesis that
the samples are generated from the same continuous distribution.
This test will yield
slightly lower power than the full permutation testing framework
(this effect is more pronounced at smaller sample
sizes, and is more pronounced in the DB category), but is orders of
magnitude faster. This option
is recommended when compute resources are limited. The remaining
steps of the scDD framework will remain unchanged
(namely, categorizing the significant DD genes into patterns that
represent the major distributional changes,
as well as the ability to visualize the results with violin plots
using the 
testZeroes 
Logical indicating whether or not to test for a difference in the proportion of zeroes. This will only be done for genes that have at least one zero value (genes where all cells have a nonzero value will have a 'zero.pvalue' of NA). 
adjust.perms 
Logical indicating whether or not to adjust the permutation tests for the sample detection rate (proportion of nonzero values). If true, the residuals of a linear model adjusted for detection rate are permuted, and new fitted values are obtained using these residuals. 
param 
a 
parallelBy 
For the permutation test (if invoked), the manner in
which to parallelize. The default option
is 
condition 
A character object that contains the name of the column in

min.size 
a positive integer that specifies the minimum size of a
cluster (number of cells) for it to be used
during the classification step. Any clusters containing fewer than

min.nonzero 
a positive integer that specifies the minimum number of
nonzero cells in each condition required for the test of differential
distributions. If a gene has fewer nonzero cells per condition, it will
still be tested for DZ (if 
level 
numeric value between 0 and 1 that specifies the alpha level for significance of a differential gene test (default value 0.05). This is used to decide whether to classify a gene into one of the differential patterns. If 'testZeroes' is FALSE and the adjusted pvalue for a given gene is below 'level', then the gene is categorized. Alternatively, if 'testZeroes' is TRUE, then the adjusted pvalue must be below 'level/2' in order to be considered significant and categorized. This is done to control for multiple testing since 'testZeroes=TRUE' means that each gene is tested for a difference in nonzeroes and zeroes separately. 
categorize 
a logical indicating whether to determine which categories (DE, DP, DM, DB) each gene belongs to (default = TRUE). This can only be set to FALSE if 'permutations' is set to zero, since the full model fitting will automatically be carried out if permutations are run. 
Find genes with differential distributions (DD) across two conditions. Models each logtransformed gene as a Dirichlet Process Mixture of normals and uses a permutation test to determine whether condition membership is independent of sample clustering. The FDR adjusted (BenjaminiHochberg) permutation pvalue is returned along with the classification of each significant gene (with pvalue less than 0.05 (or 0.025 if also testing for a difference in the proportion of zeroes)) into one of four categories (DE, DP, DM, DB). For genes that do not show significant influence, of condition on clustering, an optional test of whether the proportion of zeroes (dropout rate) is different across conditions is performed (DZ).
A SingleCellExperiment
object that contains the data and
sample information from the input object, but where the results objects
are now added to the metadata
slot. The metadata slot is now a
list with four items: the first (main results object) is a data.frame
with the following columns:
'gene': gene name (matches rownames of SCdat)
'DDcategory': name of the DD (DE, DP, DM, DB, DZ) pattern (or NS = not significant)
'Clusters.combined': the number of clusters identified overall
'Clusters.C1': the number of clusters identified in condition 1 alone
'Clusters.C2': the number of clusters identified in condition 2 alone
'nonzero.pvalue': permutation (or KS) pvalue for testing difference in nonzero expression values
'nonzero.pvalue.adj': BenjaminiHochberg adjusted version of the 'nonzero.pvalue'column
'zero.pvalue': pvalue for test of difference in dropout rate (only if 'testZeroes' is TRUE)
'zero.pvalue': BenjaminiHochberg adjusted version of the previous column (only if 'testZeroes' is TRUE)
â€˜combined.pvalue': Fisherâ€™s combined pvalue for a difference in nonzero or zero values (only if 'testZeroes' is TRUE).
'combined.pvalue.adj': BenjaminiHochberg adjusted version of the previous column (only if 'testZeroes' is TRUE)
The remaining three elements are matrices (first for condition
1 and 2 combined,
then condition 1 alone, then condition 2 alone) that contains the cluster
memberships for each sample (cluster 1,2,3,...) in columns and
genes in rows. Zeroes, which are not involved in the clustering, are
labeled as zero. See the results
function for a convenient
way to extract these results objects.
Korthauer KD, Chu LF, Newton MA, Li Y, Thomson J, Stewart R, Kendziorski C. A statistical approach for identifying differential distributions in singlecell RNAseq experiments. Genome Biology. 2016 Oct 25;17(1):222. https://genomebiology.biomedcentral.com/articles/10.1186/s130590161077y
# load toy simulated example SingleCellExperiment object to find DD genes data(scDatExSim) # check that this object is a member of the SingleCellExperiment class # and that it contains 200 samples and 30 genes class(scDatExSim) show(scDatExSim) # set arguments to pass to scDD function # we will perform 100 permutations on each of the 30 genes prior_param=list(alpha=0.01, mu0=0, s0=0.01, a0=0.01, b0=0.01) nperms < 100 # call the scDD function to perform permutations, classify DD genes, # and return results # we won't perform the test for a difference in the proportion of zeroes # since none exists in this simulated toy example data # this step will take significantly longer with more genes and/or # more permutations scDatExSim < scDD(scDatExSim, prior_param=prior_param, permutations=nperms, testZeroes=FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.