gt: Global Test

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/gt.R

Description

Tests a low-dimensional null hypothesis against a potentially high-dimensional alternative in regression models (linear regression, logistic regression, poisson regression, Cox proportional hazards model).

Usage

1
2
3
4
gt(response, alternative, null, data, test.value,
     model = c("linear", "logistic", "cox", "poisson", "multinomial"), levels,
     directional = FALSE, standardize = FALSE, permutations = 0, subsets,
     weights, alias, x = FALSE, trace)

Arguments

response

The response vector of the regression model. May be supplied as a vector or as a formula object. In the latter case, the right hand side of response is passed on to alternative if that argument is missing, or otherwise to null.

alternative

The part of the design matrix corresponding to the alternative hypothesis. The covariates of the null model do not have to be supplied again here. May be given as a half formula object (e.g. ~a+b). In that case the intercept is always suppressed. When using the dot in a formula, make sure to avoid the half formula (~., but repeat the response on the left hand side. Alternatively, the alternative argument may also be given as an ExpressionSet object, in which case t(exprs(alternative)) is used for the alternative argument, and pData(alternative) is passed on to the data argument if that argument is missing.

null

The part of the design matrix corresponding to the null hypothesis. May be given as a design matrix or as a half formula object (e.g. ~a+b). The default for null is ~1, i.e. only an intercept. This intercept may be suppressed, if desired, with null = ~0. When given as a half formula object, one may also specifiy strata (e.g. ~strata(a)+b)

).

data

Only used when response, alternative, or null is given in formula form. An optional data frame, list or environment containing the variables used in the formulae. If the variables in a formula are not found in data, the variables are taken from environment(formula), typically the environment from which gt is called.

test.value

An optional vector regression coefficients to test. The default is to test the null hypothesis that all regression coefficients of the covariates of the alternative are zero. The test.value argument can be used to test a value other than zero. The coefficients are applied to the design matrix of alternative before any standardization (see the standardize argument).

model

The type of regression model to be tested. If omitted, the function will try to determine the model from the class and values of the response argument.

levels

Only used if response is factor. Selects a subset of levels(response) to be tested, given as a character vector. If a vector of length >1, the test uses only the subjects with the specified outcome categories. If levels is of length 1, the test reduces the response to a two-valued factor, testing the specified outcome category against the others combined.

directional

If set to TRUE, directs the power of the test especially against the alternative that the true regression coefficients under the alternative have the same sign. The default is that the power of the test does not depend on the sign of the true regression coefficients. Set negative weights for covariates that are expected to have opposite sign.

standardize

If set to TRUE, standardizes all covariates of the alternative to have unit second central moment. This makes sure that the test result is independent of the relative scaling of the covariates. The default is to let covariates with more variance have a greater weight in the test.

permutations

The number of permutations to use. The default, permutations = 0, uses the asymptotic distribution. The asymptotic distribution is the exact distribution in case of the linear model with normal errors.

subsets

Optional argument that can be used to test one or more subsets of the covariates in alternative. Can be a vector of column names or column indices of alternative, or a list of such vectors. In the latter case, a separate test will be performed for each subset.

weights

Optional argument that can be used to give certain covariates in alternative greater weight in the test. Can be a vector or a list of vectors. In the latter case, a separate test will be performed for each weight vector. If both subsets and weights are specified as a list, they must have the same length. In that case, weights vectors may have either the same length as the number of covariates in alternative, or the same length as the corresponding subset vector. Weights can be negative; the sign has no effect unless directional is TRUE.

alias

Optional second label for each test. Should be a vector of the same length as subsets. See also alias.

x

If TRUE, gives back the null and alternative design matrices. Default is not to return these matrices.

trace

If TRUE, prints progress information. This is useful if many tests are performed, i.e.\ if subsets or weights is a list. Note that printing progress information involves printing of backspace characters, which is not compatible with use of Sweave. Defaults to gt.options()$trace.

Details

The Global Test tests a low-dimensional null hypothesis against a (potentially) high-dimensional alternative, using the locally most powerful test of Goeman et al (2006). In this regression model implementation, it tests the null hypothesis response ~ null, that the covariates in alternative are not associated with the response, against the alternative model response ~ null + alternative that they are.

The test has a wide range of applications. In gene set testing in microarray data analysis alternative may be a matrix of gene expression measurements, and the aim is to find which of a collection of predefined subsets of the genes (e.g. Gene Ontology terms or KEGG pathways) is most associated with the response. In penalized regression or other machine learning techniques, alternative may be a collection of predictor variables that may be used to predict a response, and the test may function as a useful pre-test to see if training the classifier is worthwhile. In goodness-of-fit testing, null may be a model with linear terms fitted to the response, and alternative may be a large collection of non-linear terms. The test may be used in this case to test the fit of the null model with linear terms against a non-linear alternative.

See the vignette for extensive examples of these applications.

Value

The function returns an object of class gt.object. Several operations and diagnostic plots can be made from this object. See also Diagnostic plots.

Note

If null is supplied as a formula object, an intercept is automatically included. As a consequence gt(Y, X, Z) will usually give a different result from gt(Y, X, ~Z). The first call is equivalent to gt(Y, X, ~0+Z), whereas the second call is equivalent to gt(Y, X, cbind(1,Z)).

P-values from the asymptotic distribution are accurate to at least two decimal places up to a value of around 1e-12. Lower p-values are numerically less reliable.

Missing values are allowed in the alternative matrix only. Missing values are imputed conservatively (i.e. under the null hypothesis). Covariates with many missing values get reduced variance and therefore automatically carry less weight in the test result.

Author(s)

Jelle Goeman: j.j.goeman@lumc.nl; Jan Oosting

References

General theory and properties of the global test are described in

Goeman, Van de Geer and Van Houwelingen (2006) Journal of the Royal Statistical Society, Series B 68 (3) 477-493.

For references related to applications of the test, see the vignette GlobalTest.pdf included with this package.

See Also

Diagnostic plots: covariates, subjects.

The gt.object function and useful functions associated with that object.

Many more examples in the vignette!

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
    # Simple examples with random data here
    # Real data examples in the Vignette

    # Random data: covariates A,B,C are correlated with Y
    set.seed(1)
    Y <- rnorm(20)
    X <- matrix(rnorm(200), 20, 10)
    X[,1:3] <- X[,1:3] + Y
    colnames(X) <- LETTERS[1:10]

    # Compare the global test with the F-test
    gt(Y, X)
    anova(lm(Y~X))

    # Using formula input
    res <- gt(Y, ~A+B, null=~C+E, data=data.frame(X))
    summary(res)

    # Beware: null models with and without intercept
    Z <- rnorm(20)
    summary(gt(Y, X, null=~Z))
    summary(gt(Y, X, null=Z))

    # Logistic regression
    gt(Y>0, X)

    # Subsets and weights (1)
    my.sets <- list(c("A", "B"), c("C","D"), c("D", "E"))
    gt(Y, X, subsets = my.sets)
    my.weights <- list(1:2, 2:1, 3:2)
    gt(Y, X, subsets = my.sets, weights=my.weights)

    # Subsets and weights (2)
    gt(Y, X, subset = c("A", "B"))
    gt(Y, X, subset = c("A", "A", "B"))
    gt(Y, X, subset = c("A", "A", "B"), weight = c(.5,.5,1))

    # Permutation testing
    summary(gt(Y, X, perm=1e4))

jellegoeman/globaltest documentation built on Dec. 29, 2021, 9:11 p.m.