Description Usage Arguments Details Value References Examples
This function is a wrapper for the QuSAGE algorithm, which tests for pathway enrichment, designed for easy integration with limma and DESeq2.
1 2 |
fit |
An object of class |
dat |
An expression matrix or matrix-like object, with rows corresponding to
probes and columns to samples. Only necessary if |
filter |
Numeric vector of length 2 specifying the filter criterion. Each
probe must have at least |
coef |
Column name or number specifying which coefficient of the model is of interest. Alternatively, a vector of three or more such strings or numbers, in which case pathways are ranked by the F-statistic for that set of coefficients. |
contrast |
Character or numeric vector of length two, specifying the column
names or numbers to be contrasted. The first and second elements will be the
numerator and denominator, respectively, of the fold change calculation.
Exactly one of |
geneSets |
A named list of one or several gene sets. |
n.points |
The number of points at which to sample the convoluted t-distribution. See Details. |
QuSAGE
### INTRO TO QMOD? ###
qmod
combines the
the statistical flexibility and empirical Bayes methods of
limma
with the specificity and sensitivity of the QuSAGE algorithm for
detecting pathway enrichment. Simulations have shown that this pipeline
outperforms each package's independent methods for gene set analysis in
complex experimental designs, namely camera
and
link[qusage]{qgen}
. See Watson & John, forthcoming.
### SOMETHING ON MArrayLM vs. DESeqDataSet OBJECTS ###
If fit is a voom object, make sure to run lcpm(dat).
By default n.points
is set to 2^14
, or 16,384 points, which will give very
accurate p-values in most cases. Sampling at more points will increase the
accuracy of the resulting p-values, but will also linearly increase the
amount of time needed to calculate the result. With larger sample sizes, as few
as 1/4 this number of points can be used without seriously affecting the accuracy
of the resulting p-values. However, when there are a small number of samples
(i.e., fewer than 8 samples total), the t-distribution must be sampled over
a much wider range, and the number of points needed for sampling should be
increased accordingly. It is recommended that when running qmod
with fewer
than 8 samples, the number of points be increased to at least 2^15 or 2^16. It may
also be useful to test higher values of this parameter, as it can often result in
a much more significant p-value with small sample sizes.
A data frame with the following columns:
'Pathway' Name of the pathway.
'logFC' Average log2 fold change of genes in the pathway.
'p.value' The gene set p-value, as calculated using
pdf.pVal
.
'FDR' The false discovery rate, as estimated using the Benjamini-Hochberg algorithm.
Yaari, G. Bolen, C.R., Thakar, J. & Kleinstein, S.H. (2013). "Quantitative set analysis for gene expression: a method to quantify gene set differential expression including gene-gene correlations." Nucleic Acids Res., 41(18): e170. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3794608/
Turner, J.A., Bolen, C.R. & Blankenship, D.M. (2015). "Quantitative gene set analysis generalized for repeated measures, confounder adjustment, and continuous covariates." BMC Bioinformatics, 16(1): 272. http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0707-9
Smyth, G.K. (2004). "Linear models and empirical Bayes methods for assessing differential expression in microarray experiments." Stat. Appl. Genet. Molec. Biol., 3(1). http://www.statsci.org/smyth/pubs/ebayes.pdf
Love, M., Huber, W., & Anders, S. (2014). "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2." Genome Biology, 15:550. https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8
Benjamini, Y., & Hochberg, Y. (1995). "Controlling the false discovery rate: a practical and powerful approach to multiple testing." Journal of the Royal Statistical Society, Series B, 57:289–300.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | # Fit limma model
library(limma)
eset <- matrix(rnorm(5000 * 10), nrow = 5000, ncol = 10,
dimnames = list(seq_len(5000), paste0('S', seq_len(10))))
clin <- data.frame(treat = rep(c("ctl", "trt"), each = 5),
batch = rep(c("b1", "b2"), times = 5),
x1 = rnorm(10),
x2 = runif(10))
des <- model.matrix(~ treat + batch + x1 + x2, data = clin)
fit <- eBayes(lmFit(eset, des))
# Create list of differentially expressed pathways
geneSets = list()
for (i in 0:10) {
genes <- ((30 * i) + 1):(30 * (i + 1))
eset[genes, clin$treat == "trt"] <- eset[genes, clin$treat == "trt"] + rnorm(1)
geneSets[[paste("Set", i)]] <- genes
}
# Run qmod
top <- qmod(fit, eset, coef = 2, geneSets)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.