normalize_data: Normalize raw counts.
In marcalva/diem: Debris-Containing Droplet Identification using EM

normalize_data

R Documentation

Normalize raw counts.

Description

Normalization of raw counts in an SCE object. Normalization is performed for the initialiation of the EM. The initialization involves clustering the PCs of the test set using k-means. The PCs are calculatd from the normalized counts.

Usage

normalize_data(x, droplets.use = NULL, genes.use = NULL,
  use_var = FALSE, sf = "median", logt = TRUE)

Arguments

`x`	An SCE object.
`droplets.use`	A character vector of droplet IDs to subset the counts data. Normalization will only be run on these droplets.
`genes.use`	A character vector of gene names to subset the counts data. Normalization will only be run for these genes.
`use_var`	A logical indicating whether to subset the data to include only variable genes. This overrides `genes.use`. The default is TRUE as it may better identify cell types.
`sf`	Either a numeric scaling factor to multiply counts after division by column sums, or "median" indicating to multiply by the median number of total read/UMI counts in droplets (default).
`logt`	A logical specifying whether to log(x+1) transform counts after size normalization. Default is TRUE.

Details

Unless specified with genes.use, only variable genes are included in the normalization. The data is normalized by dividing counts by the total counts per droplet. Then, the counts are multiplied by a scaling factor, given by sf (the median of total counts by default). Finally, the data is log transformed after adding a constant value of 1.