MDtest: Multivariate Mean Deviation Test of Means

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/MDtest.R

Description

Performs two-sample nonparametric multivariate test of means based on the minimum spanning tree (MST). It calculates the mean deviation between the cumulative distribution functions (CDFs) of sample ranks in two conditions. It tests the null hypothesis that a set of features has the same mean in two conditions versus different means.

Usage

1
MDtest(object, group, nperm=1000, pvalue.only=TRUE)

Arguments

object

a numeric matrix with columns and rows respectively corresponding to samples and features.

group

a numeric vector indicating group associations for samples. Possible values are 1 and 2.

nperm

number of permutations used to estimate the null distribution of the test statistic. If not given, a default value 1000 is used.

pvalue.only

logical. If TRUE (default), the p-value is returned. If FALSE a list of length three containing the observed statistic, the vector of permuted statistics, and the p-value is returned.

Details

This function tests the null hypothesis that a set of features has no difference in mean (shift) between two conditions. It performs a two-sample nonparametric multivariate test by ranking samples based on the minimum spanning tree (MST) as proposed by Friedman and Rafsky (1979). The MST of the weighted undirectional graph created from the samples is found. The nodes of the MST are ranked based on their position in the MST. The MST is rooted at the node with largest geodisic distance (rank 1) and then nodes are ranked in the High Directed Preorder (HDP) traversal of the tree (Rahmatallah et. al. 2012). The mean deviation between the cumulative distribution functions (CDFs) of sample ranks in two conditions is calculated. The null distribution of the test statistic is estimated by permuting sample labels nperm times and calculating the test statistic for each. P-value is calculated as

p.value = \frac{∑_{k=1}^{nperm} I ≤ft[ |D_{k}| ≥q |D_{obs}| \right] + 1}{nperm + 1}

where D_{k} is the test statistic for permutation k, D_{obs} is the observed test statistic, and I is the indicator function.

Value

When pvalue.only=TRUE (default), function MDtest returns the p-value indicating the attained significance level. When pvalue.only=FALSE, function MDtest produces a list of length 3 with the following components:

statistic

the value of the observed test statistic.

perm.stat

numeric vector of the resulting test statistic for nperm random permutations of sample labels.

p.value

p-value indicating the attained significance level.

Note

This function invokes function HDP.ranking which does not work properly if there is any node in the MST with more than 26 links. However, this situation is almost impossible for a dataset composed of a few hundreds or less of samples.

Author(s)

Yasir Rahmatallah and Galina Glazko

References

Rahmatallah Y., Emmert-Streib F. and Glazko G. (2012) Gene set analysis for self-contained tests: complex null and specific alternative hypotheses. Bioinformatics 28, 3073–3080.

Friedman J. and Rafsky L. (1979) Multivariate generalization of the Wald-Wolfowitz and Smirnov two-sample tests. Ann. Stat. 7, 697–717.

See Also

KStest, WWtest, RKStest, RMDtest, HDP.ranking.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
## generate a feature set of length 20 in two conditions
## each condition has 20 samples
## use multivariate normal distribution
library(MASS)
ngenes <- 20
nsamples <- 40
## let the mean vector have zeros of length 20 both conditions
zero_vector <- array(0,c(1,ngenes))
## set the covariance matrix to be an identity matrix for both conditions
cov_mtrx <- diag(ngenes)
gp <- mvrnorm(nsamples, zero_vector, cov_mtrx)
## apply a mean shift of 3 to half of the features under condition 2
gp[1:20,1:10] <- gp[1:20,1:10] + 3
dataset <- aperm(gp, c(2,1))
## first 20 samples belong to group 1
## second 20 samples belong to group 2
pvalue <- MDtest(object=dataset, group=c(rep(1,20),rep(2,20))) 

GSAR documentation built on Nov. 8, 2020, 7:16 p.m.