tmodUtest: Perform a statistical test of module expression

View source: R/statisticaltests.R

tmodUtestR Documentation

Perform a statistical test of module expression

Description

Perform a statistical test of module expression

Usage

tmodUtest(
  l,
  modules = NULL,
  qval = 0.05,
  order.by = "pval",
  filter = FALSE,
  mset = "all",
  cols = "Title",
  useR = FALSE,
  nodups = TRUE
)

tmodGeneSetTest(
  l,
  x,
  modules = NULL,
  qval = 0.05,
  order.by = "pval",
  filter = FALSE,
  mset = "all",
  cols = "Title",
  Nsim = 1000,
  nodups = TRUE
)

tmodCERNOtest(
  l,
  modules = NULL,
  qval = 0.05,
  order.by = "pval",
  filter = FALSE,
  mset = "all",
  cols = "Title",
  nodups = TRUE
)

tmodPLAGEtest(
  l,
  x,
  group,
  modules = NULL,
  qval = 0.05,
  order.by = "pval",
  mset = "all",
  cols = "Title",
  filter = FALSE,
  nodups = TRUE
)

tmodZtest(
  l,
  modules = NULL,
  qval = 0.05,
  order.by = "pval",
  filter = FALSE,
  mset = "all",
  cols = "Title",
  nodups = TRUE
)

tmodHGtest(
  fg,
  bg,
  modules = NULL,
  qval = 0.05,
  order.by = "pval",
  filter = FALSE,
  mset = "all",
  cols = "Title",
  nodups = TRUE
)

Arguments

l

sorted list of HGNC gene identifiers

modules

optional list of modules for which to make the test

qval

Threshold FDR value to report

order.by

Order by P value ("pval") or none ("none")

filter

Remove gene names which have no module assignments

mset

Which module set to use. Either a character vector ("LI", "DC" or "all", default: all) or an object of class tmod (see "Custom module definitions" below)

cols

Which columns from the MODULES data frame should be included in resulsts

useR

use the R wilcox.test function; slow, but with exact p-values for small samples

nodups

Remove duplicate gene names in l and corresponding rows from ranks

x

Expression matrix for the tmodPLAGEtest; a vector for tmodGeneSetTest

Nsim

for tmodGeneSetTest, number of replicates for the randomization test

group

group assignments for the tmodPLAGEtest

fg

foreground gene set for the HG test

bg

background gene set for the HG test

Details

Performs a test on either on an ordered list of genes (tmodUtest, tmodCERNOtest, tmodZtest) or on two groups of genes (tmodHGtest). tmodUtest is a U test on ranks of genes that are contained in a module.

tmodCERNOtest is also a nonparametric test working on gene ranks, but it originates from Fisher's combined probability test. This test weights genes with lower ranks more, the resulting p-values better correspond to the observed effect size. In effect, modules with small effect but many genes get higher p-values than in case of the U-test.

tmodPLAGEtest is based on the PLAGE, "Pathway level analysis of gene expression" published by Tomfohr, Lu and Kepler (2005), doi 10.1186/1471-2105-6-225. In essence it is just a t-test run on module eigengenes, but it performs really well. This approach can be used with any complex linear model; for this, use the function eigengene(). See users guide for details.

tmodZtest works very much like tmodCERNOtest, but instead of combining the rank-derived p-values using Fisher's method, it uses the Stouffer method (known also as the Z-transform test).

tmodGeneSetTest is an implementation of the function geneSetTest from the limma package (note that tmodUtest is equivalent to the limma's wilcoxGST function).

For a discussion of the above three methods, read M. C. Whitlock, "Combining probability from independent tests: the weighted Z-method is superior to Fisher's approach", J. Evol. Biol. 2005 (doi: 10.1111/j.1420-9101.2005.00917.x) for further details.

tmodHGtest is simply a hypergeometric test.

In tmod, two module sets can be used, "LI" (from Li et al. 2013), or "DC" (from Chaussabel et al. 2008). Using the parameter "mset", the module set can be selected, or, if mset is "all", both of sets are used.

Value

The statistical tests return a data frame with module names, additional statistic (e.g. enrichment or AUC, depending on the test), P value and FDR q-value (P value corrected for multiple testing using the p.adjust function and Benjamini-Hochberg correction. The data frame has class 'colorDF' (see package colorDF for details), but except for printing using colors on the terminal behaves just like an ordinary data.frame. To strip the coloring, use [colorDF::uncolor()].

Custom module definitions

Custom and arbitrary module, gene set or pathway definitions can be also provided through the mset option, if the parameter is a list rather than a character vector. The list parameter to mset must contain the following members: "MODULES", "MODULES2GENES" and "GENES".

"MODULES" and "GENES" are data frames. It is required that MODULES contains the following columns: "ID", specifying a unique identifier of a module, and "Title", containing the description of the module. The data frame "GENES" must contain the column "ID".

The list MODULES2GENES is a mapping between modules and genes. The names of the list must correspond to the ID column of the MODULES data frame. The members of the list are character vectors, and the values of these vectors must correspond to the ID column of the GENES data frame.

See Also

tmod-package

Examples

data(tmod)
fg <- tmod$MODULES2GENES[["LI.M127"]]
bg <- tmod$GENES$ID
result <- tmodHGtest( fg, bg )

## A more sophisticated example
## Gene set enrichment in TB patients compared to 
## healthy controls (Egambia data set)
## Not run: 
data(Egambia)
library(limma)
design <- cbind(Intercept=rep(1, 30), TB=rep(c(0,1), each= 15))
fit <- eBayes( lmFit(Egambia[,-c(1:3)], design))
tt <- topTable(fit, coef=2, number=Inf, genelist=Egambia[,1:3] )
tmodUtest(tt$GENE_SYMBOL)
tmodCERNOtest(tt$GENE_SYMBOL)

## End(Not run)

tmod documentation built on March 31, 2023, 9 p.m.