SCnorm: SCnorm

View source: R/SCnorm.R

SCnormR Documentation

SCnorm

Description

Quantile regression is used to estimate the dependence of read counts on sequencing depth for every gene. Genes with similar dependence are then grouped, and a second quantile regression is used to estimate scale factors within each group. Within-group adjustment for sequencing depth is then performed using the estimated scale factors to provide normalized estimates of expression. If multiple conditions are provided, normalization is performed within condition and then normalized estimates are scaled between conditions. If withinSample=TRUE then the method from Risso et al. 2011 will be implemented.

Usage

SCnorm(
  Data = NULL,
  Conditions = NULL,
  PrintProgressPlots = FALSE,
  reportSF = FALSE,
  FilterCellNum = 10,
  FilterExpression = 0,
  Thresh = 0.1,
  K = NULL,
  NCores = NULL,
  ditherCounts = FALSE,
  PropToUse = 0.25,
  Tau = 0.5,
  withinSample = NULL,
  useSpikes = FALSE,
  useZerosToScale = FALSE
)

Arguments

Data

can be a matrix of single-cell expression with cells where rows are genes and columns are samples. Gene names should not be a column in this matrix, but should be assigned to rownames(Data). Data can also be an object of class SummarizedExperiment that contains the single-cell expression matrix and other metadata. The assays slot contains the expression matrix and is named "Counts". This matrix should have one row for each gene and one sample for each column. The colData slot should contain a data.frame with one row per sample and columns that contain metadata for each sample. This data.frame should contain a variable that represents biological condition in the same order as the columns of NormCounts). Additional information about the experiment can be contained in the metadata slot as a list.

Conditions

vector of condition labels, this should correspond to the columns of the expression matrix.

PrintProgressPlots

whether to automatically produce plot as SCnorm determines the optimal number of groups (default is FALSE, highly suggest using TRUE). Plots will be printed to the current device.

reportSF

whether to provide a matrix of scaling counts in the output (default = FALSE).

FilterCellNum

the number of non-zero expression estimate required to include the genes into the SCnorm fitting (default = 10). The initial grouping fits a quantile regression to each gene, making this value too low gives unstable fits.

FilterExpression

exclude genes having median of non-zero expression from the normalization.

Thresh

threshold to use in evaluating the sufficiency of K, default is .1.

K

the number of groups for normalizing. If left unspecified, an evaluation procedure will determine the optimal value of K (recommended).

NCores

number of cores to use, default is detectCores() - 1. This will be used to set up a parallel environment using either MulticoreParam (Linux, Mac) or SnowParam (Windows) with NCores using the package BiocParallel.

ditherCounts

whether to dither/jitter the counts, may be used for data with many ties, default is FALSE.

PropToUse

proportion of genes closest to the slope mode used for the group fitting, default is set at .25. This number #' mainly affects speed.

Tau

value of quantile for the quantile regression used to estimate gene-specific slopes (default is median, Tau = .5 ).

withinSample

a vector of gene-specific features to correct counts within a sample prior to SCnorm. If NULL(default) then no correction will be performed. Examples of gene-specific features are GC content or gene length.

useSpikes

whether to use spike-ins to perform across condition scaling (default=FALSE). Spike-ins must be stored in the SingleCellExperiment object using altExp() function from SingleCellExperiment. See vignette for example.

useZerosToScale

whether to use zeros when scaling across conditions (default=FALSE).

Value

List containing matrix of normalized expression (and optionally a matrix of size factors if reportSF = TRUE ).

Author(s)

Rhonda Bacher

Examples

 
 data(ExampleSimSCData)
   Conditions = rep(c(1,2), each= 45)
   #DataNorm <- SCnorm(ExampleSimSCData, Conditions, 
   #FilterCellNum = 10)
   #str(DataNorm)

rhondabacher/SCnorm documentation built on July 8, 2023, 11:36 p.m.