# Gene Set Analysis in R

### Description

Package GSAR provides a set of statistical methods for
self-contained gene set analysis. It consists of two-sample multivariate
nonparametric statistical methods to test a null hypothesis
against specific alternative hypotheses, such as differences in shift
(functions `KStest`

and `MDtest`

), scale (functions
`RKStest`

, `RMDtest`

, and `AggrFtest`

)
or correlation structure (function `GSNCAtest`

) between two
conditions. It also offers a graphical visualization tool for correlation
networks to examine the change in the net correlation structure of a gene
set between two conditions (function `plotMST2.pathway`

).
The visualization scheme is based on the minimum spanning trees (MSTs).
Function `findMST2`

is used to find the unioin of the first
and second MSTs. The same tool works as well for protein-protein
interaction (PPI) networks to highlight the most essential interactions
among proteins and reveal fine network structure as was already shown in
Zybailov et. al. 2016. Function `findMST2.PPI`

is used to
find the unioin of the first and second MSTs of PPI networks.
Some of the methods available in this package were
proposed in Rahmatallah et. al. 2014 and Friedman and Rafsky 1979. The
performance of different methods available in this package was thoroughly
tested using simulated data and microarray datasets in
Rahmatallah et. al. 2012 and Rahmatallah et. al. 2014. These methods can
also be applied to RNA-Seq count data given that proper normalization is
used. Proper normalization must take into account both the within-sample
differences (mainly gene length) and between-samples differences
(library size or sequencing depth). However, because the count data often
follows the negative binomial distribution, special attention should be
paid to applying the variance tests (`RKStest`

,
`RMDtest`

, and `AggrFtest`

). The variance of the
negative binomial distribution is proportional to it's mean and
multivariate tests of variance designed specifically for RNA-seq count
data are virtually unavailable. The performance of variance tests in this
package with count data highly depends on the used normalization and
remains currenly under-explored.

### Author(s)

Yasir Rahmatallah <yrahmatallah@uams.edu>, Galina Glazko <gvglazko@uams.edu>

Maintainer: Yasir Rahmatallah <yrahmatallah@uams.edu>, Galina Glazko <gvglazko@uams.edu>

### References

Rahmatallah Y., Emmert-Streib F. and Glazko G. (2014) Gene sets net
correlations analysis (GSNCA): a multivariate differential coexpression test
for gene sets. Bioinformatics **30**, 360–368.

Rahmatallah Y., Emmert-Streib F. and Glazko G. (2012) Gene set analysis for
self-contained tests: complex null and specific alternative hypotheses.
Bioinformatics **28**, 3073–3080.

Friedman J. and Rafsky L. (1979) Multivariate generalization of the
Wald-Wolfowitz and Smirnov two-sample tests. Ann. Stat. **7**, 697–717.

Zybailov B., Byrd A., Glazko G., Rahmatallah Y. and Raney K. (2016) Protein-protein interaction analysis for functional characterization of helicases. Methods, doi:10.1016/j.ymeth.2016.04.014.

### See Also

`igraph`

.