pagoda.varnorm | R Documentation |
Normalizes gene expression magnitudes to ensure that the variance follows chi-squared statistics with respect to its ratio to the transcriptome-wide expectation as determined by local regression on expression magnitude (and optionally gene length). Corrects for batch effects.
pagoda.varnorm(models, counts, batch = NULL, trim = 0, prior = NULL,
fit.genes = NULL, plot = TRUE, minimize.underdispersion = FALSE,
n.cores = detectCores(), n.randomizations = 100, weight.k = 0.9,
verbose = 0, weight.df.power = 1, smooth.df = -1, max.adj.var = 10,
theta.range = c(0.01, 100), gene.length = NULL)
models |
model matrix (select a subset of rows to normalize variance within a subset of cells) |
counts |
read count matrix |
batch |
measurement batch (optional) |
trim |
trim value for Winsorization (optional, can be set to 1-3 to reduce the impact of outliers, can be as large as 5 or 10 for datasets with several thousand cells) |
prior |
expression magnitude prior |
fit.genes |
a vector of gene names which should be used to establish the variance fit (default is NULL: use all genes). This can be used to specify, for instance, a set spike-in control transcripts such as ERCC. |
plot |
whether to plot the results |
minimize.underdispersion |
whether underdispersion should be minimized (can increase sensitivity in datasets with high complexity of population, however cannot be effectively used in datasets where multiple batches are present) |
n.cores |
number of cores to use |
n.randomizations |
number of bootstrap sampling rounds to use in estimating average expression magnitude for each gene within the given set of cells |
weight.k |
k value to use in the final weight matrix |
verbose |
verbosity level |
weight.df.power |
power factor to use in determining effective number of degrees of freedom (can be increased for datasets exhibiting particularly high levels of noise at low expression magnitudes) |
smooth.df |
degrees of freedom to be used in calculating smoothed local regression between coefficient of variation and expression magnitude (and gene length, if provided). Leave at -1 for automated guess. |
max.adj.var |
maximum value allowed for the estimated adjusted variance (capping of adjusted variance is recommended when scoring pathway overdispersion relative to randomly sampled gene sets) |
theta.range |
valid theta range (should be the same as was set in knn.error.models() call |
gene.length |
optional vector of gene lengths (corresponding to the rows of counts matrix) |
a list containing the following fields:
mat adjusted expression magnitude values
matw weight matrix corresponding to the expression matrix
arv a vector giving adjusted variance values for each gene
avmodes a vector estimated average expression magnitudes for each gene
modes a list of batch-specific average expression magnitudes for each gene
prior estimated (or supplied) expression magnitude prior
edf estimated effective degrees of freedom
fit.genes fit.genes parameter
data(pollen)
cd <- clean.counts(pollen)
knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.