Description Usage Arguments Details Value Methods (by class) Note Author(s) References See Also
We have implemented a highly efficient WilcoxonMannWhitney rank sum
test for highthroughput expression profiling data. For datasets with
more than 100 features (genes), the function can be more than 1,000
times faster than its R implementations (wilcox.test
in
stats
, or rankSumTestWithCorrelation
in limma
).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53  wmwTest(x, indexList, col = "GeneSymbol", valType = c("p.greater", "p.less",
"p.two.sided", "U", "abs.log10p.greater", "log10p.less",
"abs.log10p.two.sided", "Q"), simplify = TRUE)
## S4 method for signature 'matrix,IndexList'
wmwTest(x, indexList, valType, simplify = TRUE)
## S4 method for signature 'numeric,IndexList'
wmwTest(x, indexList, valType, simplify = TRUE)
## S4 method for signature 'matrix,GmtList'
wmwTest(x, indexList, valType, simplify = TRUE)
## S4 method for signature 'eSet,GmtList'
wmwTest(x, indexList, col = "GeneSymbol",
valType = c("p.greater", "p.less", "p.two.sided", "U", "abs.log10p.greater",
"log10p.less", "abs.log10p.two.sided", "Q"), simplify = TRUE)
## S4 method for signature 'eSet,numeric'
wmwTest(x, indexList, col = "GeneSymbol",
valType = c("p.greater", "p.less", "p.two.sided", "U", "abs.log10p.greater",
"log10p.less", "abs.log10p.two.sided", "Q"), simplify = TRUE)
## S4 method for signature 'eSet,logical'
wmwTest(x, indexList, col = "GeneSymbol",
valType = c("p.greater", "p.less", "p.two.sided", "U", "abs.log10p.greater",
"log10p.less", "abs.log10p.two.sided", "Q"), simplify = TRUE)
## S4 method for signature 'eSet,list'
wmwTest(x, indexList, col = "GeneSymbol",
valType = c("p.greater", "p.less", "p.two.sided", "U", "abs.log10p.greater",
"log10p.less", "abs.log10p.two.sided", "Q"), simplify = TRUE)
## S4 method for signature 'ANY,numeric'
wmwTest(x, indexList, valType, simplify = TRUE)
## S4 method for signature 'ANY,logical'
wmwTest(x, indexList, valType, simplify = TRUE)
## S4 method for signature 'ANY,list'
wmwTest(x, indexList, valType, simplify = TRUE)
## S4 method for signature 'matrix,SignedIndexList'
wmwTest(x, indexList, valType,
simplify = TRUE)
## S4 method for signature 'numeric,SignedIndexList'
wmwTest(x, indexList, valType,
simplify = TRUE)
## S4 method for signature 'eSet,SignedIndexList'
wmwTest(x, indexList, valType,
simplify = TRUE)

x 
A numeric matrix. All other data types (e.g. numeric vectors
or 
indexList 
A list of integer indices (starting from 1) indicating
signature genes. Can be of length zero. Other data types (e.g. a list
of numeric or logical vectors, or a numeric or logical vector) are
coerced into such a list. See 
col 
a string sometimes used with a 
valType 
The value type to be returned, allowed values
include 
simplify 
Logical. If not, the returning value is in matrix
format; if set to 
The basic application of the function is to test the enrichment of gene sets in expression profiling data or differentially expressed data (the matrix with feature/gene in rows and samples in columns).
A special case is when x
is an eSet
object
(e.g. ExpressionSet
), and indexList
is a list returned
from readGmt
function. In this case, the only requirement is
that one column named GeneSymbol
in the featureData
contain gene symbols used in the GMT file. See the example below.
Besides the conventional value types such as ‘p.greater’,
‘p.less’, ‘p.two.sided’ , and ‘U’ (the Ustatistic),
wmwTest
(from version 0.991) provides further value types:
abs.log10p.greater
and log10p.less
perform log10
transformation on respective pvalues and give the
transformed value a proper sign (positive for greater than, and
negative for less than); abs.log10p.two.sided
transforms
twosided pvalues to nonnegative values; and Q
score
reports absolute log10transformation of pvalue of the
twoside variant, and gives a proper sign to it, depending on whether it is
rather greater than (positive) or less than (negative).
A numeric matrix or vector containing the statistic.
x = matrix,indexList = IndexList
: x
is a matrix
and indexList
is a IndexList
x = numeric,indexList = IndexList
: x
is a numeric
and indexList
is a IndexList
x = matrix,indexList = GmtList
: x
is a matrix
and indexList
is a GmtList
x = eSet,indexList = GmtList
: x
is a eSet
and indexList
is a GmtList
x = eSet,indexList = numeric
: x
is a eSet
and indexList
is a numeric
x = eSet,indexList = logical
: x
is a eSet
and indexList
is a logical
x = eSet,indexList = list
: x
is a eSet
and indexList
is a list
x = ANY,indexList = numeric
: x
is ANY
and indexList
is a numeric
x = ANY,indexList = logical
: x
is ANY
and indexList
is a logical
x = ANY,indexList = list
: x
is ANY
and indexList
is a list
x = matrix,indexList = SignedIndexList
: x
is a matrix
and indexList
is a
SignedIndexList
x = numeric,indexList = SignedIndexList
: x
is a numeric
and indexList
is a
SignedIndexList
x = eSet,indexList = SignedIndexList
: x
is a eSet
and indexList
is a
SignedIndexList
The function has been optimized for expression profiling data. It
avoids repetitive ranking of data as done by native R implementations
and uses efficient C code to increase the performance and control
memory use. Simulation studies using expression profiles of 22000
genes in 2000 samples and 200 gene sets suggested that the C
implementation can be >1000 times faster than the R
implementation. And it is possible to further accelerate by
parallel calling the function with mclapply
in the multicore
package.
Jitao David Zhang <[email protected]>
Barry, W.T., Nobel, A.B., and Wright, F.A. (2008). A statistical framework for testing functional categories in microarray data. _Annals of Applied Statistics_ 2, 286315.
Wu, D, and Smyth, GK (2012). Camera: a competitive gene set test accounting for intergene correlation. _Nucleic Acids Research_ 40(17):e133
Zar, JH (1999). _Biostatistical Analysis 4th Edition_. PrenticeHall International, Upper Saddle River, New Jersey.
codewilcox.test in the stats
package, and rankSumTestWithCorrelation
in
the limma
package.
#'@examples
## Rnative data structures
set.seed(1887)
rd < rnorm(1000)
rl < sample(c(TRUE, FALSE), 1000, replace=TRUE)
wmwTest(rd, rl, valType="p.two.sided")
wmwTest(rd, which(rl), valType="p.two.sided")
rd1 < rd + ifelse(rl, 0.5, 0)
wmwTest(rd1, rl, valType="p.greater")
wmwTest(rd1, rl, valType="U")
rd2 < rd  ifelse(rl, 0.2, 0)
wmwTest(rd2, rl, valType="p.greater")
wmwTest(rd2, rl, valType="p.two.sided")
wmwTest(rd2, rl, valType="p.less")
## matrix forms rmat < matrix(c(rd, rd1, rd2), ncol=3, byrow=FALSE) wmwTest(rmat, rl, valType="p.two.sided") wmwTest(rmat, rl, valType="p.greater")
wmwTest(rmat, which(rl), valType="p.two.sided") wmwTest(rmat, which(rl), valType="p.greater")
## other valTypes wmwTest(rmat, which(rl), valType="U") wmwTest(rmat, which(rl), valType="abs.log10p.greater") wmwTest(rmat, which(rl), valType="log10p.less") wmwTest(rmat, which(rl), valType="abs.log10p.two.sided") wmwTest(rmat, which(rl), valType="Q")
## using ExpressionSet data(sample.ExpressionSet) testSet < sample.ExpressionSet fData(testSet)$GeneSymbol < paste("GENE_",1:nrow(testSet), sep="") mySig1 < sample(c(TRUE, FALSE), nrow(testSet), prob=c(0.25, 0.75), replace=TRUE) wmwTest(testSet, which(mySig1), valType="p.greater")
## using integer exprs(testSet)[,1L] < exprs(testSet)[,1L] + ifelse(mySig1, 50, 0) wmwTest(testSet, which(mySig1), valType="p.greater")
## using lists mySig2 < sample(c(TRUE, FALSE), nrow(testSet), prob=c(0.6, 0.4), replace=TRUE) wmwTest(testSet, list(first=mySig1, second=mySig2)) ## using GMT file gmt_file < system.file("extdata/exp.tissuemark.affy.roche.symbols.gmt", package="BioQC") gmt_list < readGmt(gmt_file)
gss < sample(unlist(sapply(gmt_list, function(x) x$genes)), 1000) eset<new("ExpressionSet", exprs=matrix(rnorm(10000), nrow=1000L), phenoData=new("AnnotatedDataFrame", data.frame(Sample=LETTERS[1:10])), featureData=new("AnnotatedDataFrame",data.frame(GeneSymbol=gss))) esetWmwRes < wmwTest(eset ,gmt_list, valType="p.greater") summary(esetWmwRes)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.