# AggrFtest: Aggregated F-Test of Variance Using Fisher's Probability... In GSAR: Gene Set Analysis in R

## Description

Performs two-sample nonparametric test of variance. The univariate F-test is used for every gene in the gene set and the resulted p-values are aggregated together using Fisher's probability combining method and used as the test statistic. The null distribution of the test statistic is estimated by permuting sample labels and calculating the test statistic for a large number of times. This statistic tests the null hypothesis that none of the genes shows significant difference in variance between two conditions against the alternative hypothesis that at least one gene shows significant difference in variance between two conditions according to the F-test.

## Usage

 1 AggrFtest(object, group, nperm=1000, pvalue.only=TRUE) 

## Arguments

 object a numeric matrix with columns and rows respectively corresponding to samples and features (genes). group a numeric vector indicating group associations for samples. Possible values are 1 and 2. nperm a numeric value indicating the number of permutations used to estimate the null distribution of the test statistic. If not given, a default value 1000 is used. pvalue.only logical. If TRUE (default), the p-value is returned. If FALSE a list of length three containing the observed statistic, the vector of permuted statistics, and the p-value is returned.

## Details

This function tests the null hypothesis that none of the genes in a gene set shows a significant difference in variance between two conditions according to the F-test against the alternative hypothesis that at least one gene shows significant difference in variance according to the F-test. It performs a two-sample nonparametric test of variance by using the univariate F-test for every gene in a set, adjust for multiple testing using the Benjamini and Hochberg method (also known as FDR) as shown in Benjamini and Hochberg (1995), and then aggregates the obtained adjusted p-values using Fisher's probability combining method to get a test statistic (T) for the gene set

T = -2 ∑_{i=1}^{p} \log_{e} (p_{i})

where p_{i} is the adjusted p-value of the univariate F-test for gene i. The null distribution of the test statistic is estimated by permuting sample labels nperm times and calculating the test statistic T for each. P-value is calculated as

p.value = \frac{∑_{k=1}^{nperm} I ≤ft[ T_{k} ≥q T_{obs} \right] + 1}{nperm + 1}

where T_{k} is the test statistic for permutation k, T_{obs} is the observed test statistic, and I is the indicator function.

## Value

When pvalue.only=TRUE (default), function AggrFtest returns the p-value indicating the attained significance level. When pvalue.only=FALSE, function AggrFtest produces a list of length 3 with the following components:

 statistic the value of the observed test statistic. perm.stat numeric vector of the resulting test statistic for nperm random permutations of sample labels. p.value p-value indicating the attained significance level.

## Author(s)

Yasir Rahmatallah and Galina Glazko

## References

Benjamini Y. and Hochberg Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B 57, 289–300.

RKStest, RMDtest.
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ## generate a feature set of length 20 in two conditions ## each condition has 20 samples ## use multivariate normal distribution library(MASS) ngenes <- 20 nsamples <- 40 ## let the mean vector have zeros of length 20 for both conditions zero_vector <- array(0,c(1,ngenes)) ## set the covariance matrix to be an identity matrix for condition 1 cov_mtrx <- diag(ngenes) gp1 <- mvrnorm((nsamples/2), zero_vector, cov_mtrx) ## set some scale difference in the covariance matrix for condition 2 cov_mtrx <- cov_mtrx*3 gp2 <- mvrnorm((nsamples/2), zero_vector, cov_mtrx) ## combine the data of two conditions into one dataset gp <- rbind(gp1,gp2) dataset <- aperm(gp, c(2,1)) ## first 20 samples belong to group 1 ## second 20 samples belong to group 2 pvalue <- AggrFtest(object=dataset, group=c(rep(1,20),rep(2,20)))