preprocess: preprocess

View source: R/Preprocess.R

preprocessR Documentation



Function to preprocess SingleCellExperiment object (1) to only keep genes with a certain number of nonzero entries, and (2) optionally apply a normalization procedure.


preprocess(SCdat, condition = "condition", zero.thresh = 0.9,
  scran_norm = FALSE, median_norm = FALSE)



An object of class SingleCellExperiment that contains single-cell expression and metadata. The assays slot contains a named list of matrices, where the normalized counts are housed in the one named normcounts, and unnormalized counts are stored in the one names counts. If either scran_norm or median_norm is set to TRUE, the normcounts slot will be created from the counts slot. The counts and normalized counts matrices should have one row for each gene and one sample for each column. The colData slot should contain a data.frame with one row per sample and columns that contain metadata for each sample. This data.frame should contain a variable that represents biological condition, which is in the form of numeric values (either 1 or 2) that indicates which condition each sample belongs to (in the same order as the columns of normcounts). Optional additional metadata about each cell can also be contained in this data.frame, and additional information about the experiment can be contained in the metadata slot as a list.


A character object that contains the name of the column in colData that represents the biological group or condition of interest (e.g. treatment versus control). Note that this variable should only contain two possible values since scDD can currently only handle two-group comparisons. The default option assumes that there is a column named "condition" that contains this variable.


A numeric value between 0 and 1 that represents the maximum proportion of zeroes per gene allowable in the processed dataset


Logical indicating whether or not to normalize the data using scran Normalization from scran


Logical indicating whether or not to normalize the data using Median Normalization from EBSeq


An object of class SingleCellExperiment with genes removed if they have more than zero.thresh zeroes, and the normcounts assay added if either scran_norm or median_norm is set to TRUE and only counts is provided. If normcounts already exists and either scran_norm or median_norm is set to TRUE, then the new normalized counts are placed in the normcounts assay slot, and the original values are moved to a new slot called normcounts-orig.


Korthauer KD, Chu LF, Newton MA, Li Y, Thomson J, Stewart R, Kendziorski C. A statistical approach for identifying differential distributions in single-cell RNA-seq experiments. Genome Biology. 2016 Oct 25;17(1):222.


 # load toy example SingleCellExperiment object
 # apply the preprocess function to filter out genes if they have more than
 # 75% zero
 scDatEx <- preprocess(scDatEx, zero.thresh=0.75)
 # apply the preprocess function again, but this time threshold on the 
 # proportion of zeroes and apply scran normalization
 # set the zero.thresh argument to 0.75 so that genes with more than 75% 
 # zeroes are filtered out 
 # set the scran_norm argument to TRUE to return scran normalized counts
 scDatEx.scran <- preprocess(scDatEx, zero.thresh=0.75, scran_norm=TRUE)
 # set the median_norm argument to TRUE to return Median normalized counts
 scDatEx.median <- preprocess(scDatEx, zero.thresh=0.75, median_norm=TRUE)

kdkorthauer/scDD documentation built on March 27, 2022, 5:11 a.m.