SCnorm | R Documentation |
Quantile regression is used to estimate the dependence of read counts on sequencing depth for every gene. Genes with similar dependence are then grouped, and a second quantile regression is used to estimate scale factors within each group. Within-group adjustment for sequencing depth is then performed using the estimated scale factors to provide normalized estimates of expression. If multiple conditions are provided, normalization is performed within condition and then normalized estimates are scaled between conditions. If withinSample=TRUE then the method from Risso et al. 2011 will be implemented.
SCnorm(
Data = NULL,
Conditions = NULL,
PrintProgressPlots = FALSE,
reportSF = FALSE,
FilterCellNum = 10,
FilterExpression = 0,
Thresh = 0.1,
K = NULL,
NCores = NULL,
ditherCounts = FALSE,
PropToUse = 0.25,
Tau = 0.5,
withinSample = NULL,
useSpikes = FALSE,
useZerosToScale = FALSE
)
Data |
can be a matrix of single-cell expression with cells
where rows are genes and columns are samples. Gene names should
not be a column in this matrix, but should be assigned to rownames(Data).
Data can also be an object of class |
Conditions |
vector of condition labels, this should correspond to the columns of the expression matrix. |
PrintProgressPlots |
whether to automatically produce plot as SCnorm determines the optimal number of groups (default is FALSE, highly suggest using TRUE). Plots will be printed to the current device. |
reportSF |
whether to provide a matrix of scaling counts in the output (default = FALSE). |
FilterCellNum |
the number of non-zero expression estimate required to include the genes into the SCnorm fitting (default = 10). The initial grouping fits a quantile regression to each gene, making this value too low gives unstable fits. |
FilterExpression |
exclude genes having median of non-zero expression from the normalization. |
Thresh |
threshold to use in evaluating the sufficiency of K, default is .1. |
K |
the number of groups for normalizing. If left unspecified, an evaluation procedure will determine the optimal value of K (recommended). |
NCores |
number of cores to use, default is detectCores() - 1. This will be used to set up a parallel environment using either MulticoreParam (Linux, Mac) or SnowParam (Windows) with NCores using the package BiocParallel. |
ditherCounts |
whether to dither/jitter the counts, may be used for data with many ties, default is FALSE. |
PropToUse |
proportion of genes closest to the slope mode used for the group fitting, default is set at .25. This number #' mainly affects speed. |
Tau |
value of quantile for the quantile regression used to estimate gene-specific slopes (default is median, Tau = .5 ). |
withinSample |
a vector of gene-specific features to correct counts within a sample prior to SCnorm. If NULL(default) then no correction will be performed. Examples of gene-specific features are GC content or gene length. |
useSpikes |
whether to use spike-ins to perform across condition scaling (default=FALSE). Spike-ins must be stored in the SingleCellExperiment object using altExp() function from SingleCellExperiment. See vignette for example. |
useZerosToScale |
whether to use zeros when scaling across conditions (default=FALSE). |
List containing matrix of normalized expression (and optionally a matrix of size factors if reportSF = TRUE ).
Rhonda Bacher
data(ExampleSimSCData)
Conditions = rep(c(1,2), each= 45)
#DataNorm <- SCnorm(ExampleSimSCData, Conditions,
#FilterCellNum = 10)
#str(DataNorm)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.