Description Usage Arguments Details Value Methods (by class) Note Author(s) References See Also Examples
We have implemented a highly efficient Wilcoxon-Mann-Whitney rank sum
test for high-throughput expression profiling data. For datasets with
more than 100 features (genes), the function can be more than 1,000
times faster than its R implementations (wilcox.test
in
stats
, or rankSumTestWithCorrelation
in limma
).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | wmwTest(
x,
indexList,
col = "GeneSymbol",
valType = c("p.greater", "p.less", "p.two.sided", "U", "abs.log10p.greater",
"log10p.less", "abs.log10p.two.sided", "Q"),
simplify = TRUE
)
## S4 method for signature 'matrix,IndexList'
wmwTest(x, indexList, valType, simplify = TRUE)
## S4 method for signature 'numeric,IndexList'
wmwTest(x, indexList, valType, simplify = TRUE)
## S4 method for signature 'matrix,GmtList'
wmwTest(x, indexList, valType, simplify = TRUE)
## S4 method for signature 'eSet,GmtList'
wmwTest(
x,
indexList,
col = "GeneSymbol",
valType = c("p.greater", "p.less", "p.two.sided", "U", "abs.log10p.greater",
"log10p.less", "abs.log10p.two.sided", "Q"),
simplify = TRUE
)
## S4 method for signature 'eSet,numeric'
wmwTest(
x,
indexList,
col = "GeneSymbol",
valType = c("p.greater", "p.less", "p.two.sided", "U", "abs.log10p.greater",
"log10p.less", "abs.log10p.two.sided", "Q"),
simplify = TRUE
)
## S4 method for signature 'eSet,logical'
wmwTest(
x,
indexList,
col = "GeneSymbol",
valType = c("p.greater", "p.less", "p.two.sided", "U", "abs.log10p.greater",
"log10p.less", "abs.log10p.two.sided", "Q"),
simplify = TRUE
)
## S4 method for signature 'eSet,list'
wmwTest(
x,
indexList,
col = "GeneSymbol",
valType = c("p.greater", "p.less", "p.two.sided", "U", "abs.log10p.greater",
"log10p.less", "abs.log10p.two.sided", "Q"),
simplify = TRUE
)
## S4 method for signature 'ANY,numeric'
wmwTest(x, indexList, valType, simplify = TRUE)
## S4 method for signature 'ANY,logical'
wmwTest(x, indexList, valType, simplify = TRUE)
## S4 method for signature 'ANY,list'
wmwTest(x, indexList, valType, simplify = TRUE)
## S4 method for signature 'matrix,SignedIndexList'
wmwTest(x, indexList, valType, simplify = TRUE)
## S4 method for signature 'matrix,SignedGenesets'
wmwTest(x, indexList, valType, simplify = TRUE)
## S4 method for signature 'numeric,SignedIndexList'
wmwTest(x, indexList, valType, simplify = TRUE)
## S4 method for signature 'eSet,SignedIndexList'
wmwTest(x, indexList, valType, simplify = TRUE)
## S4 method for signature 'eSet,SignedGenesets'
wmwTest(
x,
indexList,
col = "GeneSymbol",
valType = c("p.greater", "p.less", "p.two.sided", "U", "abs.log10p.greater",
"log10p.less", "abs.log10p.two.sided", "Q"),
simplify = TRUE
)
|
x |
A numeric matrix. All other data types (e.g. numeric vectors
or |
indexList |
A list of integer indices (starting from 1) indicating
signature genes. Can be of length zero. Other data types (e.g. a list
of numeric or logical vectors, or a numeric or logical vector) are
coerced into such a list. See |
col |
a string sometimes used with a |
valType |
The value type to be returned, allowed values
include |
simplify |
Logical. If not, the returning value is in matrix
format; if set to |
The basic application of the function is to test the enrichment of gene sets in expression profiling data or differentially expressed data (the matrix with feature/gene in rows and samples in columns).
A special case is when x
is an eSet
object
(e.g. ExpressionSet
), and indexList
is a list returned
from readGmt
function. In this case, the only requirement is
that one column named GeneSymbol
in the featureData
contain gene symbols used in the GMT file. The same applies to signed Gmt files. See the example below.
Besides the conventional value types such as ‘p.greater’,
‘p.less’, ‘p.two.sided’ , and ‘U’ (the U-statistic),
wmwTest
(from version 0.99-1) provides further value types:
abs.log10p.greater
and log10p.less
perform log10
transformation on respective p-values and give the
transformed value a proper sign (positive for greater than, and
negative for less than); abs.log10p.two.sided
transforms
two-sided p-values to non-negative values; and Q
score
reports absolute log10-transformation of p-value of the
two-side variant, and gives a proper sign to it, depending on whether it is
rather greater than (positive) or less than (negative).
A numeric matrix or vector containing the statistic.
x = matrix,indexList = IndexList
: x
is a matrix
and indexList
is a IndexList
x = numeric,indexList = IndexList
: x
is a numeric
and indexList
is a IndexList
x = matrix,indexList = GmtList
: x
is a matrix
and indexList
is a GmtList
x = eSet,indexList = GmtList
: x
is a eSet
and indexList
is a GmtList
x = eSet,indexList = numeric
: x
is a eSet
and indexList
is a numeric
x = eSet,indexList = logical
: x
is a eSet
and indexList
is a logical
x = eSet,indexList = list
: x
is a eSet
and indexList
is a list
x = ANY,indexList = numeric
: x
is ANY
and indexList
is a numeric
x = ANY,indexList = logical
: x
is ANY
and indexList
is a logical
x = ANY,indexList = list
: x
is ANY
and indexList
is a list
x = matrix,indexList = SignedIndexList
: x
is a matrix
and indexList
is a
SignedIndexList
x = matrix,indexList = SignedGenesets
: x
is a eSet
and indexList
is a
SignedIndexList
x = numeric,indexList = SignedIndexList
: x
is a numeric
and indexList
is a
SignedIndexList
x = eSet,indexList = SignedIndexList
: x
is a eSet
and indexList
is a
SignedIndexList
x = eSet,indexList = SignedGenesets
: x
is a eSet
and indexList
is a
SignedIndexList
The function has been optimized for expression profiling data. It
avoids repetitive ranking of data as done by native R implementations
and uses efficient C code to increase the performance and control
memory use. Simulation studies using expression profiles of 22000
genes in 2000 samples and 200 gene sets suggested that the C
implementation can be >1000 times faster than the R
implementation. And it is possible to further accelerate by
parallel calling the function with mclapply
in the multicore
package.
Jitao David Zhang <jitao_david.zhang@roche.com>
Barry, W.T., Nobel, A.B., and Wright, F.A. (2008). A statistical framework for testing functional namespaces in microarray data. _Annals of Applied Statistics_ 2, 286-315.
Wu, D, and Smyth, GK (2012). Camera: a competitive gene set test accounting for inter-gene correlation. _Nucleic Acids Research_ 40(17):e133
Zar, JH (1999). _Biostatistical Analysis 4th Edition_. Prentice-Hall International, Upper Saddle River, New Jersey.
codewilcox.test in the stats
package, and rankSumTestWithCorrelation
in
the limma
package.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | ## R-native data structures
set.seed(1887)
rd <- rnorm(1000)
rl <- sample(c(TRUE, FALSE), 1000, replace=TRUE)
wmwTest(rd, rl, valType="p.two.sided")
wmwTest(rd, which(rl), valType="p.two.sided")
rd1 <- rd + ifelse(rl, 0.5, 0)
wmwTest(rd1, rl, valType="p.greater")
wmwTest(rd1, rl, valType="U")
rd2 <- rd - ifelse(rl, 0.2, 0)
wmwTest(rd2, rl, valType="p.greater")
wmwTest(rd2, rl, valType="p.two.sided")
wmwTest(rd2, rl, valType="p.less")
## matrix forms
rmat <- matrix(c(rd, rd1, rd2), ncol=3, byrow=FALSE)
wmwTest(rmat, rl, valType="p.two.sided")
wmwTest(rmat, rl, valType="p.greater")
wmwTest(rmat, which(rl), valType="p.two.sided")
wmwTest(rmat, which(rl), valType="p.greater")
## other valTypes
wmwTest(rmat, which(rl), valType="U")
wmwTest(rmat, which(rl), valType="abs.log10p.greater")
wmwTest(rmat, which(rl), valType="log10p.less")
wmwTest(rmat, which(rl), valType="abs.log10p.two.sided")
wmwTest(rmat, which(rl), valType="Q")
## using ExpressionSet
data(sample.ExpressionSet)
testSet <- sample.ExpressionSet
fData(testSet)$GeneSymbol <- paste("GENE_",1:nrow(testSet), sep="")
mySig1 <- sample(c(TRUE, FALSE), nrow(testSet), prob=c(0.25, 0.75), replace=TRUE)
wmwTest(testSet, which(mySig1), valType="p.greater")
## using integer
exprs(testSet)[,1L] <- exprs(testSet)[,1L] + ifelse(mySig1, 50, 0)
wmwTest(testSet, which(mySig1), valType="p.greater")
## using lists
mySig2 <- sample(c(TRUE, FALSE), nrow(testSet), prob=c(0.6, 0.4), replace=TRUE)
wmwTest(testSet, list(first=mySig1, second=mySig2))
## using GMT file
gmt_file <- system.file("extdata/exp.tissuemark.affy.roche.symbols.gmt", package="BioQC")
gmt_list <- readGmt(gmt_file)
gss <- sample(unlist(sapply(gmt_list, function(x) x$genes)), 1000)
eset<-new("ExpressionSet",
exprs=matrix(rnorm(10000), nrow=1000L),
phenoData=new("AnnotatedDataFrame", data.frame(Sample=LETTERS[1:10])),
featureData=new("AnnotatedDataFrame",data.frame(GeneSymbol=gss)))
esetWmwRes <- wmwTest(eset ,gmt_list, valType="p.greater")
summary(esetWmwRes)
## using signed GMT file
signed_gmt_file <- system.file("extdata/test.gmt", package="BioQC")
signed_gmt <- readSignedGmt(signed_gmt_file)
esetSignedWmwRes <- wmwTest(eset, signed_gmt, valType="p.greater")
esetMat <- exprs(eset); rownames(esetMat) <- fData(eset)$GeneSymbol
esetSignedWmwRes2 <- wmwTest(esetMat, signed_gmt, valType="p.greater")
|
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, basename, cbind, colnames,
dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which.max, which.min
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
[1] 0.8535404
[1] 0.8535404
[1] 1.949087e-14
[1] 90452
[1] 0.9976222
[1] 0.004758899
[1] 0.002379449
[1] 8.535404e-01 3.898175e-14 4.758899e-03
[1] 4.267702e-01 1.949087e-14 9.976222e-01
[1] 8.535404e-01 3.898175e-14 4.758899e-03
[1] 4.267702e-01 1.949087e-14 9.976222e-01
[1] 124152 90452 137887
[1] 0.369805934 13.710168669 0.001033906
[1] -2.416062e-01 -8.437865e-15 -2.623524e+00
[1] 0.06877594 13.40913867 2.32249352
[1] 0.06877594 13.40913867 -2.32249352
A B C D E F G H
0.3968848 0.4750041 0.4090897 0.5238279 0.5757461 0.4790929 0.4343313 0.6157803
I J K L M N O P
0.6872272 0.5165239 0.4499611 0.5820556 0.4755881 0.4934181 0.4666891 0.4785086
Q R S T U V W X
0.6280457 0.2976157 0.3232340 0.6152199 0.6149396 0.3576255 0.3540736 0.4102298
Y Z
0.5086293 0.4288550
A B C D E F
4.427725e-06 4.750041e-01 4.090897e-01 5.238279e-01 5.757461e-01 4.790929e-01
G H I J K L
4.343313e-01 6.157803e-01 6.872272e-01 5.165239e-01 4.499611e-01 5.820556e-01
M N O P Q R
4.755881e-01 4.934181e-01 4.666891e-01 4.785086e-01 6.280457e-01 2.976157e-01
S T U V W X
3.232340e-01 6.152199e-01 6.149396e-01 3.576255e-01 3.540736e-01 4.102298e-01
Y Z
5.086293e-01 4.288550e-01
A B C D E F G
first 4.427725e-06 0.4750041 0.4090897 0.5238279 0.5757461 0.4790929 0.4343313
second 3.322462e-01 0.5830970 0.4508703 0.4503638 0.4788279 0.2795935 0.5079108
H I J K L M N
first 0.6157803 0.6872272 0.5165239 0.4499611 0.5820556 0.4755881 0.4934181
second 0.5554548 0.6009788 0.5637822 0.4024829 0.4129130 0.5557076 0.4844365
O P Q R S T U
first 0.4666891 0.4785086 0.6280457 0.2976157 0.3232340 0.6152199 0.6149396
second 0.5477364 0.5394920 0.4508703 0.5673067 0.4577155 0.4478326 0.4821416
V W X Y Z
first 0.3576255 0.3540736 0.4102298 0.5086293 0.4288550
second 0.3385415 0.6612245 0.5985069 0.6346372 0.3837895
1 2 3 4
Min. :0.02198 Min. :0.01316 Min. :0.005433 Min. :0.007333
1st Qu.:0.28609 1st Qu.:0.24582 1st Qu.:0.194880 1st Qu.:0.233636
Median :0.55866 Median :0.51048 Median :0.460976 Median :0.598480
Mean :0.55896 Mean :0.50834 Mean :0.470506 Mean :0.547411
3rd Qu.:0.79758 3rd Qu.:0.72584 3rd Qu.:0.706622 3rd Qu.:0.848243
Max. :1.00000 Max. :1.00000 Max. :1.000000 Max. :1.000000
5 6 7 8
Min. :0.002327 Min. :0.01225 Min. :0.00879 Min. :0.0002731
1st Qu.:0.262446 1st Qu.:0.27261 1st Qu.:0.23966 1st Qu.:0.2535412
Median :0.531550 Median :0.58203 Median :0.52827 Median :0.5691982
Mean :0.523777 Mean :0.55246 Mean :0.52780 Mean :0.5251984
3rd Qu.:0.773279 3rd Qu.:0.79958 3rd Qu.:0.80507 3rd Qu.:0.7977362
Max. :1.000000 Max. :1.00000 Max. :1.00000 Max. :1.0000000
9 10
Min. :0.001639 Min. :0.009709
1st Qu.:0.265712 1st Qu.:0.273843
Median :0.503677 Median :0.517334
Mean :0.513945 Mean :0.531123
3rd Qu.:0.738335 3rd Qu.:0.817494
Max. :1.000000 Max. :1.000000
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.