pagoda.varnorm: Normalize gene expression variance relative to...
In hms-dbmi/scde: Single Cell Differential Expression

pagoda.varnorm

R Documentation

Normalize gene expression variance relative to transcriptome-wide expectations

Description

Normalizes gene expression magnitudes to ensure that the variance follows chi-squared statistics with respect to its ratio to the transcriptome-wide expectation as determined by local regression on expression magnitude (and optionally gene length). Corrects for batch effects.

Usage

pagoda.varnorm(models, counts, batch = NULL, trim = 0, prior = NULL,
  fit.genes = NULL, plot = TRUE, minimize.underdispersion = FALSE,
  n.cores = detectCores(), n.randomizations = 100, weight.k = 0.9,
  verbose = 0, weight.df.power = 1, smooth.df = -1, max.adj.var = 10,
  theta.range = c(0.01, 100), gene.length = NULL)

Arguments

`models`	model matrix (select a subset of rows to normalize variance within a subset of cells)
`counts`	read count matrix
`batch`	measurement batch (optional)
`trim`	trim value for Winsorization (optional, can be set to 1-3 to reduce the impact of outliers, can be as large as 5 or 10 for datasets with several thousand cells)
`prior`	expression magnitude prior
`fit.genes`	a vector of gene names which should be used to establish the variance fit (default is NULL: use all genes). This can be used to specify, for instance, a set spike-in control transcripts such as ERCC.
`plot`	whether to plot the results
`minimize.underdispersion`	whether underdispersion should be minimized (can increase sensitivity in datasets with high complexity of population, however cannot be effectively used in datasets where multiple batches are present)
`n.cores`	number of cores to use
`n.randomizations`	number of bootstrap sampling rounds to use in estimating average expression magnitude for each gene within the given set of cells
`weight.k`	k value to use in the final weight matrix
`verbose`	verbosity level
`weight.df.power`	power factor to use in determining effective number of degrees of freedom (can be increased for datasets exhibiting particularly high levels of noise at low expression magnitudes)
`smooth.df`	degrees of freedom to be used in calculating smoothed local regression between coefficient of variation and expression magnitude (and gene length, if provided). Leave at -1 for automated guess.
`max.adj.var`	maximum value allowed for the estimated adjusted variance (capping of adjusted variance is recommended when scoring pathway overdispersion relative to randomly sampled gene sets)
`theta.range`	valid theta range (should be the same as was set in knn.error.models() call
`gene.length`	optional vector of gene lengths (corresponding to the rows of counts matrix)

Value

a list containing the following fields:

mat adjusted expression magnitude values
matw weight matrix corresponding to the expression matrix
arv a vector giving adjusted variance values for each gene
avmodes a vector estimated average expression magnitudes for each gene
modes a list of batch-specific average expression magnitudes for each gene
prior estimated (or supplied) expression magnitude prior
edf estimated effective degrees of freedom
fit.genes fit.genes parameter

Examples

data(pollen)
cd <- clean.counts(pollen)

knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)

hms-dbmi/scde documentation built on April 19, 2023, 10:21 p.m.