qmod: Run QuSAGE on limma or DESeq2 model objects

Description Usage Arguments Details Value References Examples

Description

This function is a wrapper for the QuSAGE algorithm, which tests for pathway enrichment, designed for easy integration with limma and DESeq2.

Usage

1
2
qmod(fit, dat = NULL, filter = NULL, coef, contrast = NULL, geneSets,
  n.points = 2^14)

Arguments

fit

An object of class limma::MArrayLM, as created by a call to eBayes, or a DESeqDataSet that has been fit with a negative binomial GLM. See Details.

dat

An expression matrix or matrix-like object, with rows corresponding to probes and columns to samples. Only necessary if fit is an MArrayLM object.

filter

Numeric vector of length 2 specifying the filter criterion. Each probe must have at least filter[1] log2-counts per million in at least filter[2] libraries to pass the expression threshold. Only relevant if fit is a DESeqDataSet, in which case the normality of transformed residuals at various values of filter should be checked prior to running qmod, for example using check_resid. See Details.

coef

Column name or number specifying which coefficient of the model is of interest. Alternatively, a vector of three or more such strings or numbers, in which case pathways are ranked by the F-statistic for that set of coefficients.

contrast

Character or numeric vector of length two, specifying the column names or numbers to be contrasted. The first and second elements will be the numerator and denominator, respectively, of the fold change calculation. Exactly one of coef or contrast must be NULL.

geneSets

A named list of one or several gene sets.

n.points

The number of points at which to sample the convoluted t-distribution. See Details.

Details

QuSAGE

### INTRO TO QMOD? ### qmod combines the

the statistical flexibility and empirical Bayes methods of limma with the specificity and sensitivity of the QuSAGE algorithm for detecting pathway enrichment. Simulations have shown that this pipeline outperforms each package's independent methods for gene set analysis in complex experimental designs, namely camera and link[qusage]{qgen}. See Watson & John, forthcoming.

### SOMETHING ON MArrayLM vs. DESeqDataSet OBJECTS ###

If fit is a voom object, make sure to run lcpm(dat).

By default n.points is set to 2^14, or 16,384 points, which will give very accurate p-values in most cases. Sampling at more points will increase the accuracy of the resulting p-values, but will also linearly increase the amount of time needed to calculate the result. With larger sample sizes, as few as 1/4 this number of points can be used without seriously affecting the accuracy of the resulting p-values. However, when there are a small number of samples (i.e., fewer than 8 samples total), the t-distribution must be sampled over a much wider range, and the number of points needed for sampling should be increased accordingly. It is recommended that when running qmod with fewer than 8 samples, the number of points be increased to at least 2^15 or 2^16. It may also be useful to test higher values of this parameter, as it can often result in a much more significant p-value with small sample sizes.

Value

A data frame with the following columns:

References

Yaari, G. Bolen, C.R., Thakar, J. & Kleinstein, S.H. (2013). "Quantitative set analysis for gene expression: a method to quantify gene set differential expression including gene-gene correlations." Nucleic Acids Res., 41(18): e170. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3794608/

Turner, J.A., Bolen, C.R. & Blankenship, D.M. (2015). "Quantitative gene set analysis generalized for repeated measures, confounder adjustment, and continuous covariates." BMC Bioinformatics, 16(1): 272. http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0707-9

Smyth, G.K. (2004). "Linear models and empirical Bayes methods for assessing differential expression in microarray experiments." Stat. Appl. Genet. Molec. Biol., 3(1). http://www.statsci.org/smyth/pubs/ebayes.pdf

Love, M., Huber, W., & Anders, S. (2014). "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2." Genome Biology, 15:550. https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8

Benjamini, Y., & Hochberg, Y. (1995). "Controlling the false discovery rate: a practical and powerful approach to multiple testing." Journal of the Royal Statistical Society, Series B, 57:289–300.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# Fit limma model
library(limma)
eset <- matrix(rnorm(5000 * 10), nrow = 5000, ncol = 10,
               dimnames = list(seq_len(5000), paste0('S', seq_len(10))))
clin <- data.frame(treat = rep(c("ctl", "trt"), each = 5),
                   batch = rep(c("b1", "b2"), times = 5),
                      x1 = rnorm(10),
                      x2 = runif(10))
des <- model.matrix(~ treat + batch + x1 + x2, data = clin)
fit <- eBayes(lmFit(eset, des))

# Create list of differentially expressed pathways
geneSets = list()
for (i in 0:10) {
  genes <- ((30 * i) + 1):(30 * (i + 1))
  eset[genes, clin$treat == "trt"] <- eset[genes, clin$treat == "trt"] + rnorm(1)
  geneSets[[paste("Set", i)]] <- genes
}

# Run qmod
top <- qmod(fit, eset, coef = 2, geneSets)

dswatson/biowrapr documentation built on May 15, 2019, 4:52 p.m.