SCnorm: SCnorm
In SCnorm: Normalization of single cell RNA-seq data

Description Usage Arguments Value Author(s) Examples

Quantile regression is used to estimate the dependence of read counts on sequencing depth for every gene. Genes with similar dependence are then grouped, and a second quantile regression is used to estimate scale factors within each group. Within-group adjustment for sequencing depth is then performed using the estimated scale factors to provide normalized estimates of expression. If multiple conditions are provided, normalization is performed within condition and then normalized estimates are scaled between conditions. If withinSample=TRUE then the method from Risso et al. 2011 will be implemented.

SCnorm(
  Data = NULL,
  Conditions = NULL,
  PrintProgressPlots = FALSE,
  reportSF = FALSE,
  FilterCellNum = 10,
  FilterExpression = 0,
  Thresh = 0.1,
  K = NULL,
  NCores = NULL,
  ditherCounts = FALSE,
  PropToUse = 0.25,
  Tau = 0.5,
  withinSample = NULL,
  useSpikes = FALSE,
  useZerosToScale = FALSE
)

`Data`	can be a matrix of single-cell expression with cells where rows are genes and columns are samples. Gene names should not be a column in this matrix, but should be assigned to rownames(Data). Data can also be an object of class `SummarizedExperiment` that contains the single-cell expression matrix and other metadata. The `assays` slot contains the expression matrix and is named `"Counts"`. This matrix should have one row for each gene and one sample for each column. The `colData` slot should contain a data.frame with one row per sample and columns that contain metadata for each sample. This data.frame should contain a variable that represents biological condition in the same order as the columns of `NormCounts`). Additional information about the experiment can be contained in the `metadata` slot as a list.
`Conditions`	vector of condition labels, this should correspond to the columns of the expression matrix.
`PrintProgressPlots`	whether to automatically produce plot as SCnorm determines the optimal number of groups (default is FALSE, highly suggest using TRUE). Plots will be printed to the current device.
`reportSF`	whether to provide a matrix of scaling counts in the output (default = FALSE).
`FilterCellNum`	the number of non-zero expression estimate required to include the genes into the SCnorm fitting (default = 10). The initial grouping fits a quantile regression to each gene, making this value too low gives unstable fits.
`FilterExpression`	exclude genes having median of non-zero expression from the normalization.
`Thresh`	threshold to use in evaluating the sufficiency of K, default is .1.
`K`	the number of groups for normalizing. If left unspecified, an evaluation procedure will determine the optimal value of K (recommended).
`NCores`	number of cores to use, default is detectCores() - 1. This will be used to set up a parallel environment using either MulticoreParam (Linux, Mac) or SnowParam (Windows) with NCores using the package BiocParallel.
`ditherCounts`	whether to dither/jitter the counts, may be used for data with many ties, default is FALSE.
`PropToUse`	proportion of genes closest to the slope mode used for the group fitting, default is set at .25. This number #' mainly affects speed.
`Tau`	value of quantile for the quantile regression used to estimate gene-specific slopes (default is median, Tau = .5 ).
`withinSample`	a vector of gene-specific features to correct counts within a sample prior to SCnorm. If NULL(default) then no correction will be performed. Examples of gene-specific features are GC content or gene length.
`useSpikes`	whether to use spike-ins to perform across condition scaling (default=FALSE). Spike-ins must be stored in the SingleCellExperiment object using altExp() function from SingleCellExperiment. See vignette for example.
`useZerosToScale`	whether to use zeros when scaling across conditions (default=FALSE).

List containing matrix of normalized expression (and optionally a matrix of size factors if reportSF = TRUE ).

Rhonda Bacher

 
 data(ExampleSimSCData)
   Conditions = rep(c(1,2), each= 45)
   #DataNorm <- SCnorm(ExampleSimSCData, Conditions, 
   #FilterCellNum = 10)
   #str(DataNorm)