multtest: Multiple testing correction for the Global Test

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Description

A collection of multiple testing procedures for the Global Test. Methods for the focus level procedure of Goeman and Mansmann for graph-structured hypotheses, and for the inheritance procedure based on Meinshausen.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# The focus level method:
focusLevel (test, sets, focus, ancestors, offspring,
           stop = 1, atoms = TRUE, trace)

findFocus (sets, ancestors, offspring, maxsize = 10, atoms = TRUE)


# The inheritance method:
inheritance (test, sets, weights, ancestors, offspring, Shaffer,
            homogeneous = TRUE, trace)


# Utilities for focus level and inheritance method:
leafNodes (object, alpha=0.05, type = c("focuslevel","inheritance"))

draw (object, alpha = 0.05, type = c("focuslevel","inheritance"),
      names=FALSE, sign.only = FALSE, interactive = FALSE)

Arguments

object

A gt.object, usually one in which more than one test was performed.

test

Either a function or gt.object. If a function, that function should take as its argument a vector of covariate labels, and return (raw) p-value. See the examples below. If a gt.object the call to gt that created it must have had all the covariates of sets (below) in its alternative argument.

sets

A named list representing covariate sets of the hypotheses of interest, for which adjusted p-values are to be calculated. If it is missing but test is a gt.object, the subsets slot of that object will be used. If used in the inheritance, sets describe a tree structure of hypotheses. In this case, object of class hclust or dendrogram.

focus

The focus level of the focus level method. Must be a subset of names(sets). Represents the level of the graph at which the method is focused, i.e. has most power.

ancestors

An environment or list that maps each set in sets to all its ancestors, i.e. its proper supersets. If missing, ancestors is determined from the input of offspring, or, if that is also missing, from the input of sets (time-consuming).

offspring

An environment or list that maps each set in sets to all its offspring, i.e. its proper subsets. If missing, offspring is determined from the input of ancestors, or, if that is also missing, from the input of sets (time-consuming).

stop

Determines when to stop the algorithm. If stop is set to a value smaller than or equal to 1, the algorithm only calculates familywise error rate corrected p-values of at most stop. If stop is set to a value greater than 1, the algorithm stops when it has rejected at least stop hypotheses. If set to exactly 1, the algorithm calculates all familywise error rate corrected p-values. Corrected p-values that are not calculated are reported as NA.

atoms

If set to TRUE, the focus level algorithm partitions the offspring of each focus level set into the smallest possible building blocks, called atoms. Doing this often greatly accelerates computation, but sometimes at the cost of some power.

trace

If set to TRUE, reports progress information. The default is obtained from gt.options()$trace. Alternatively, setting trace = 2 gives much more extensive output (focusLevel only).

maxsize

Parameter to choose the height of the focus level. The focus level sets are chosen in such a way that the number of tests that is to be done for each focus level set is at most 2^maxsize - 1.

alpha

The alpha level of familywise error control for the significant subgraph.

Shaffer

If set to TRUE, it applys the Shaffer improvement. If Shaffer is NULL and object is a gt.object the procedure checks whether Shaffer=TRUE is valid, and sets the value accordingly.

weights

Optional weights vector for the leaf nodes. If it is missing but test is a gt.object, the result of weights(object) will be used. In all other cases weights is set to be uniform among all leaf nodes.

homogeneous

If set to TRUE, redistributes the alpha of rejected leaf node hypotheses homogeneously over the hypotheses under test, rather than to closest related hypotheses.

type

Argument for specifying which multiple testing correction method should be used. Only relevant if both the inheritance and the focuslevel procedures were performed on the same set of test results.

names

If set to TRUE, draws the graph with node names rather than numbers.

sign.only

If set to TRUE, draws only the subgraph corresponding to the significant nodes. If FALSE, draws the full graph with the non-significant nodes grayed out.

interactive

If set to TRUE, creates an interactive graph in which the user can see the node label by clicking on the node.

Details

Multiple testing correction becomes important if the Global Test is performed on many covariate subsets.

If the hypotheses are structured in such a way that many of the tested subsets are subsets of other sets, more powerful procedures can be applied that take advantage of this structure to gain power. Two methods are implemented in the globaltest package: the inheritance method for tree-structured hypotheses and the focusLevel method for general directed acyclic graphs. For simple multiple testing that does not use such structure, see p.adjust.

The focusLevel procedure makes use of the fact that some sets are subsets or supersets of each other, as specified by the user in the offspring and ancestors arguments. Viewing the subset and superset structure as a graph, the procedure starts testing at a focus level: a subset of the nodes of the graph. If the procedure finds significance at this focus level, it proceeds to find significant subsets and supersets of the focus level sets. Like Holm's procedure, the focus level procedure is valid regardless of the correlation structure between the test statistics.

The focus level method requires the choice of a “focus level” in the graph. The findFocus function is a utility function for automatically choosing a focus level. It chooses a collection of focus level sets in such a way that the number of tests to be done for each focus level node is at most 2^maxsize. In practice this usually means that each focus level node has at most maxsize leaf nodes as offspring. Choosing focus level nodes with too many offspring nodes may result in excessively long computation times.

The inheritance method is an alternative method for calculating familywise error rate corrected p-values. Like the focus level method, inheritance also makes use of the structure of the tested sets to gain power. In this case, however, the graph is restricted to a tree, as can be obtained for example if the tested subsets are obtained from a hierarchical clustering. The inheritance procedure is used in the covariates function. Like Holm's method and the focus level method, the inheritance procedure makes no assumptions on the joint distribution of the test statistics.

The leafNodes function extracts the leaf nodes of the significant subgraph after a focus level procedure was performed. As this graph is defined by its leaf nodes, this is the most efficient summary of the test result. Only implemented for gt.object input.

The draw function draws the graph, displaying the significant nodes. It either draws the full graph with the non-significant nodes grayed out (sign.only = TRUE), or it draws only the subgraph corresponding to the significant nodes.

See the vignette for extensive applications.

Value

The function multtest returns an object of class gt.object with an appropriate column added to the test results matrix.

The focusLevel and inheritance functions returns a gt.object if a gt.object argument was given as input, otherwise it returns a matrix with a column of raw p-values and a column of corrected p-values.

The function leafNodes returns a gt.object.

findFocus returns a character vector.

Note

In the graph terminology of the focus level method, ancestor means superset, and offspring means subset.

The validity of the focus level procedure depends on certain assumptions on the null hypothesis that is tested for each set. See the paper by Goeman and Mansmann (cited below) for the precise assumptions. Similar assumptions are necessary for the Shaffer improvement of the inheritance procedure.

Author(s)

Jelle Goeman: j.j.goeman@lumc.nl; Livio Finos

References

The methods used by multtest:

Holm (1979) A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6: 65-70.

Benjamini and Hochberg (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B 57: 289-300.

Benjamini and Yekutieli (2001) The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 29 (4) 1165-1188.

The focus level method:

Goeman and Mansmann (2008) Multiple testing on the directed acyclic graph of gene ontology. Bioinformatics 24 (4) 537-544.

The inheritance method:

Meinshausen (2008) Hierarchical testing of variable importance. Biometrika 95 (2), 265-278.

For references related to applications of the test, see the vignette GlobalTest.pdf included with this package.

See Also

The gt function. The gt.object function and useful functions associated with that object.

Many more examples in the vignette!

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
    # Simple examples with random data here
    # Real data examples in the Vignette

    # Random data: covariates A,B,C are correlated with Y
    set.seed(1)
    Y <- rnorm(20)
    X <- matrix(rnorm(200), 20, 10)
    X[,1:3] <- X[,1:3] + Y
    colnames(X) <- LETTERS[1:10]

    # Some subsets of interest
    my.sets1 <- list(abc = LETTERS[1:3], cde  = LETTERS[3:5],
                     fgh = LETTERS[6:8], hij = LETTERS[8:10])
    res <- gt(Y, X, subsets = my.sets1)

    # Simple multiple testing
    p.adjust(res)
    p.adjust(res, "BH")

    # A whole structure of sets
    my.sets2 <- as.list(LETTERS[1:10])
    names(my.sets2) <- letters[1:10]
    my.sets3 <- list(all = LETTERS[1:10])
    my.sets <- c(my.sets2,my.sets1,my.sets3)

    # Do the focus level procedure
    # Choose a focus level by hand
    my.focus <- c("abc","cde","fgh","hij")
    # Or automated
    my.focus <- findFocus(my.sets, maxsize = 8)
    resF <- focusLevel(res, sets = my.sets, focus = my.focus)
    leafNodes(resF, alpha = .1)

    # Compare
    p.adjust(resF, "holm")

    # Focus level with a custom test
    Ftest <- function(set) anova(lm(Y~X[,set]))[["Pr(>F)"]][1]
    focusLevel(Ftest, sets=my.sets, focus=my.focus)

    # analyze data using inheritance procedure
    res <- gt(Y, X, subsets = list(colnames(X)))
    # define clusters on the covariates X
    hcl=hclust(dist(t(X)))
    # Do inheritance procedure
    resI=inheritance(res, sets = hcl)
    resI
    leafNodes(resI, alpha = .1)

    # inheritance procedure with a custom test
    inheritance(Ftest, sets = hcl, Shaffer=TRUE)

jellegoeman/globaltest documentation built on Dec. 29, 2021, 9:11 p.m.