highTtest | R Documentation |
Implements the method developed by Cao and Kosorok (2011) for the significance analysis of thousands of features in high-dimensional biological studies. It is an asymptotically valid data-driven procedure to find critical values for rejection regions controlling the k-familywise error rate, false discovery rate, and the tail probability of false discovery proportion.
highTtest(dataSet1, dataSet2, gammas, compare = "BOTH", cSequence = NULL,
tSequence = NULL)
dataSet1 |
data.frame or matrix containing the dataset for subset 1 for the two-sample t-test. |
dataSet2 |
data.frame or matrix containing the dataset for subset 2 for the two-sample t-test. |
gammas |
vector of significance levels at which feature significance is to be determined. |
compare |
one of ("ST", "BH", "Both", "None"). In addition to the Cao-Kosorok method, obtain feature significance indicators using the Storey-Tibshirani method (ST) (Storey and Tibshirani, 2003), the Benjamini-Hochberg method (BH), (Benjamini andHochberg, 1995), "both" the ST and the BH methods, or do not consider alternative methods (none). |
cSequence |
A vector specifying the values of c to be considered in estimating the proportion of alternative hypotheses. If no vector is provided, a default of seq(0.01,6,0.01) is used. See Section 2.3 of Cao and Kosorok (2011) for more information. |
tSequence |
A vector specifying the search space for the critical t value. If no vector is provided, a default of seq(0.01,6,0.01) is used. |
The Storey-Tibshirani (2003), ST, method implemented in highTtest is adapted from the implementation written by Alan Dabney and John D. Storey and available from
http://www.bioconductor.org/packages/release/bioc/html/qvalue.html.
The comparison capability is included only for convenience and reproducibility of the original manuscript. For a complete analysis based on the ST method, the user is referred to the qvalue package available through the bioconductor archive.
The following methods retrieve individual results from a highTtest object, x:
BH(x)
:
Retrieves a matrix of logical values. The
rows correspond to features, the columns to levels
of significance. Matrix elements are TRUE if feature
was determined to be significant by the Benjamini-Hochberg
(1995) method.
CK(x)
:
Retrieves a matrix of logical values. The
rows correspond to features, the columns to levels
of significance. Matrix elements are TRUE if feature
was determined to be significant by the Cao-Kosorok
(2011) method.
pi_alt(x)
: Retrieves the
estimated proportion of alternative hypotheses
obtained by the Cao-Kosorok (2011) method.
pvalue(x)
: Retrieves the
vector of p-values calculated using the
two-sample t-statistic.
ST(x)
:
Retrieves a matrix of logical values. The
rows correspond to features, the columns to levels
of significance. Matrix elements are TRUE if feature
was determined to be significant by the Storey-Tibshirani
(2003) method.
A simple x-y plot comparing the number of significant features as a function of the level significance level can be generated using
plot(x,...)
: Generates a plot
of the number of significant features as a function of the
level of significance as calculated for each method (CK,BH, and/or
ST). Additional plot controls can be passed through the ellipsis.
When comparisons to the ST and BH methods are requested, Venn diagrams can be generated.
vennD(x, gamma, ...)
: Generates
two- and three-dimensional Venn diagrams comparing the
features selected by each method. Implements methods of
package VennDiagram. In addition to the highTtest
object, the level of significance, gamma
, must
also be provided. Most control argument of the
VennDiagram package can be passed through the ellipsis.
Returns an object of class highTtest
.
Authors: Hongyuan Cao, Michael R. Kosorok, and Shannon T. Holloway <shannon.t.holloway@gmail.com> Maintainer: Shannon T. Holloway <shannon.t.holloway@gmail.com>
Cao, H. and Kosorok, M. R. (2011). Simultaneous critical values for t-tests in very high dimensions. Bernoulli, 17, 347–394. PMCID: PMC3092179.
Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57, 289–300.
Storey, J. and Tibshirani, R. (2003). Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences, USA, 100, 9440–9445.
set.seed(123)
x1 <- matrix(c(runif(500),runif(500,0.25,1)),nrow=100)
obj <- highTtest(dataSet1=x1[,1:5],
dataSet2=x1[,6:10],
gammas=seq(0.1,1,0.1),
tSequence=seq(0.001,3,0.001))
#Print number of significant features identified in each method
colSums(CK(obj))
colSums(ST(obj))
colSums(BH(obj))
#Plot the number of significant features identified in each method
plot(obj, main="Example plot")
vennD(obj, 0.8, Title="Example vennD")
#Proportion of alternative hypotheses
pi_alt(obj)
#p-values
pvalue(obj)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.