Generalized Mann-Whitney Type Tests

Share:

Description

The function calculates p-values for different tests as presented in the paper "Generalized Mann-Whitney Type Tests for Microarray Experiments".

Usage

1
2
3
  gmw(X,g,goi=NULL,test="mw",type="permutation",prob="pair",nper=2000,
  alternative="greater",mc=1,output="min", cluster=NULL, order = TRUE,
  keepPM= FALSE, mwAkw=FALSE, alg=NULL)

Arguments

X

Data matrix, each column corresponds to a variable, each row to an individual. Can also be a vector (one variable).

g

Vector of length nrow(X) (respective length(X)), assigning treatment groups (numbers) to observations, see details.

goi

Vector with elements of g, defining for which treatment groups the test should be performed.

test

Specifies the test statistic.

type

Permutation test ("permutation") or asymptotic tests("asymptotic") for the calculation of the p-values. Tests implemented in R-base "external" are also accessible, see details.

prob

This option is only for the Mann-Whitney test, see details.

nper

If type is "permutation" this option specifies how many permutations are used to calculate the p-value.

alternative

Specifies the alternative, the options are "smaller","greater" and "two.sided", see details.

mc

Multiple Cores, determines how many tests will be performed parallel (only available under Linux), see details.

output

Determines the level of the details in the output.

cluster

A vector of same length as g, giving possible cluster information for performing the permutation test.

order

Boolean, shall all orders be calculated or only increasing orders?

keepPM

Boolean, keep the permutation matrices, required for Westfall & Young multiple testing adjustment.

mwAkw

Boolean or numeric, if TRUE pairwise Mann-Whitney tests are performed after the Kruskal-Wallis test. If numeric MW tests are only performed for KW tests with smaller p-value than that value.

alg

Internal function, what permutation algorithm should be used. Shouln't be changed by the user.

Details

The object X is the data vector (one variable) or the data matrix. Each row refers to an observation, and each column to a variable. The tests are performed separately for all variables.

The vector g gives the group number. The directional tests are based on this numbering of the group.

The goi option defines, which treatment groups are used in the test constructions. If no groups are specified (default), all groups are used.

The test option specifies the test statistic. Possible options are 'uit'(union intersection test), 'triple' (test based on triple indicator functions), 'jt' for Jonckheere-Terpstra test, 'jt*' for a modified Jonckheere-Terpstra test, 'mw' for the Mann-Whitney / Wilcoxon test and 'kw' for the Kruskal-Wallis test. See also reference [1] for further details.

The option type is used to decide how the p-values are computed. For all tests are permutation type tests available and the option for that is type="permutation". In addition for test='mw', test='kw' or test='jt' also the option type='external' is available. This calls then the code from the base system or other, imported packages. For test='uit' there is also an asymptotic test (type="asymptotic") available. For test='triple' or test='jt*' asymptotic implementations are currently under development.

The prob option is only for the Mann-Whitney test. For the option "single", the tests are to compare a single group versus all the other groups. The option "pair" makes all pair-wise comparisons between the groups.

The option alternative is used to specify whether one-sided or two-sided alternatives are used. If the test is based on the PIs, the option "greater" for example means that, according to the alternative, the groups with larger group numbers tend to have larger observations as well. The function createGroups may be used to renumber the groups, if needed.

The mc option is only valid if X is a matrix and the used OS is Linux, because the parallelisation is based on the package parallel, and that again is based on the concept of forking, which is currently only supported under Linux.

The option output can be used to control how detailed the output is. The default "min" reports just the matrix of p-values in a matrix (columns=variables, rows=tests). If output="full", a list will be returned with items containing full test objects of class htest.

The option cluster is an additional object for the Kruskal-Wallis permutation test. For cluster-dependent observation, only the permutations within clusters are acceptable for the p-value calculation.

In the getSigTests function it is possible to apply the Westfall & Young multiple testing method. In this approach the permutation matrix is used to adjust for multiple testing, hence if one wishes to apply this method, the only option for type is "permutation". In addition the boolean flag keepPM has to be set to TRUE. Is default is to drop the permutation matrix after each run in order to save memory.

If a Kruskal-Wallis test is performed, there is also the option to perform afterwards paiwise Mann-Whitney tests to identify concrete, deviating groups. If one wishes to do that just for significant variables one can set the option mwAkw to the corresponding significance level. If mwAkw is set to TRUE (or respective 1) the Mann-Whitney tests are performed for all variables.

There is also a function to choose the used calculation algorithm, options here are "Rsubmat", "Rnaive", "Csubmat", "Cnaive". The purpose is just for validation.

Value

A matrix or vector of p-values of the underlying hypothesis test(s). In case of output="full" we give a list, and each list item contains the htest object for the column-wise performed test.

Author(s)

Daniel Fischer

References

Fischer, D., Oja, H., Schleutker, J., Sen, P.K., Wahlfors, T. (2013): Generalized Mann-Whitney Type Tests for Microarray Experiments, Scandinavian Journal of Statistic, doi: 10.1111/sjos.12055.

Daniel Fischer, Hannu Oja (2015). Mann-Whitney Type Tests for Microarray Experiments: The R Package gMWT. Journal of Statistical Software, 65(9), 1-19. URL http://www.jstatsoft.org/v65/i01/.

Westfall, P.H. and Young, S.S. (1993): Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment. Wiley, New York.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
  X <- c(sample(15))
  X <- c(X,101,102,103)
  g <- c(1,1,1,2,2,2,2,3,3,3,4,4,4,4,4,5,5,5)
  cluster=c(rep(c(1,2),9))

  gmw(X,g,test="kw",type="external")
  gmw(X,g,test="kw",type="permutation")
  gmw(X,g,test="kw",type="permutation",cluster=cluster)

  gmw(X,g,test="jt",type="permutation")